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ANALYSIS OF FACTORIAL ARRANGEMENTS WHEN 
THE DATA ARE PROPORTIONS 


G. V. Dyxe H. D. Patrerson 
Rothamsted Experimental Station 


1. The statistician is often confronted with data in which the meas- 
urement of interest is the proportion of units possessing a certain 
attribute. Such data can arise in factorial arrangements in either de- 
signed experiments or survey work. An example of the former is given 
by an experiment on a number of varieties of bulbs which have been 
infested with eelworm. Treatments of hot water at two temperatures 
are applied for two lengths of time, these treatments together with two 
sizes of bulb forming a 2° arrangement. The observations are the pro- 
portions of bulbs blooming in the different categories. A second example 
is taken from survey work. In a survey on virus diseases in potatoes 
the data may be recorded separately for each combination of a number 
of factors which may affect the data e.g. varieties, age of seed (new or 
old), and height of the field above sea level (above or below say 500 
feet). Here the observations would be the proportion of diseased plants. 

In a factorial experiment on yield data, the main effects and inter- 
actions are estimated together with their standard errors. The esti- 
mates of the effects are given by linear functions of the observations 
This paper is concerned with the corresponding analysis of factorial 
arrangements when the data are proportions. 


2. If the proportions are all near to one-half, the ordinary methods 
of analysis applicable to yield data are still available, including the 
estimation of effects from differences between weighted or unweighted 
means. The analysis is then an approximation but sufficiently good for 
practical purposes. In other cases it is not appropriate to make com- 
parisons by examining linear functions of the proportions. Suppose, 
for example, that two men firing on a target achieve percentage hits of 
10 and 5. Under a change of conditions, such as lengthening of the 
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range, the rate for the first man is reduced to 5 per cent. We do not 
then expect a decrease of 5 per cent, i.e. to zero, in the rate for the 
second man. 

In such circumstances it is necessary to specify some other model. 
Thus in the above example we might expect a proportionate decrease 
to one half of the original percentage. A departure from proportionality 
would be described as an interaction between the effects of the marksmen 
and the range. Whilst such a rule will not be of universal applicability 
it is often likely to represent a good approximation to the truth. 

The method of maximum likelihood will be used to obtain estimates 
of the effects of factors. The solution is considerably simplified by 
transforming the data to a scale in which the effects follow a linear law. 


3. The choice of transformation depends on the specified model. It 
is unlikely that the correct choice will be indicated by the data them- 
selves. In general, therefore, a transformation has to be chosen on a 
prior: grounds, or for convenience of reference to existing tables. 

It is desirable that the transformation to be applied should satisfy 
certain simple conditions. Firstly, the range of the transformed scale 
should be from — © to +. Secondly, it is desirable that the trans- 
formed scale should be symmetrical. The “‘z’” transformation 
(a) 
where the p are observed proportions and g = 1 — p, ranges from — ~ 
to + and the interchanging of p, q merely involves a change of sign. 
The property of proportionality mentioned in the example of the two 
marksmen holds approximately for small p. Incidentally, the mode! 
proposed by Bartlett (1935) for zero interactions in contingency tables 
leads to the “z” transformation. The transformation (1) will be used 
in this paper. If P, Q are the expected proportions and Z = 3 log, P/Q 
and if a linear law is operative on the transformed scale we can write 
Z as a linear function of s independent parameters 6; representing main 
effects and interactions. We have 


Z; Lin Oe (2) 


z= 


where 7 refers to a class defined by a particular combination of treat- 
ments and k takes all values from 1 to s. 

Other transformations such as the probit or angular could also be 
used. The scale of the angular transformation is finite, so that a linear 
law cannot hold exactly. If, however, in a designed experiment with 
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equal numbers of units in each class the proportions follow a binomial 
distribution, the transformed values have equal variances and the analy- 
sis is relatively simple. Jolly (1950) has discussed the use of various 
transformations in his paper on the combination of percentage kills 
obtained in an experiment repeated on several occasions. He showed 
how to fit constants for treatments and occasions for the case in which 
there is no interaction between treatments and occasions. 


4. An example of the inadequate procedures which have sometimes 
been applied was provided in this journal by Lombard and Doering’s 
(1947) treatment of data on cancer knowledge. Each one of 1729 indi- 
viduals, classified according to the presence or absence of four factors, 
was allotted either a good score or a poor score on cancer knowledge. 
Lombard and Doering have arranged their data in contingency tables 
and have applied two methods of analysis which essentially test the 
degree of association or correlation of good scores with each of the 
factors in turn, making adjustments for the remaining factors. The 
strict validity of these methods is open to question. Thus in one method 
product-moment correlation coefficients are calculated from two-way 
tables and from these partial correlation coefficients are derived. The 
latter coefficients were transformed to Fisher’s “z’’ and tested for sig- 
nificance. This procedure is however strictly applicable only when the 
correlation coefficients are calculated from data following a multivariate 
normal distribution. 

Apart from questions of validity, the methods of Lombard and 
Doering suffer from the grave defect that estimates of the magnitudes 
of the various effects are not obtained. 

On the other hand the method of the present paper allows the 
efficient estimation of the magnitude of effects and their standard errors 
as well as the application of significance tests. It is also possible by an 
extension of the method to obtain estimates (with standard errors) of 
the magnitude of particular interactions which it may be thought neces- 
sary to isolate. Lombard and Doering provide neither estimates nor 
tests of significance of interactions, and the extension of their methods 
in this direction is not obvious. The method about to be described pro- 
vides a simple test of the goodness of fit of the hypothesis adopted, in the 
form of a x’ test. 


5. The mathematics will not be given in detail as the method follows 
closely that used for other transformations (see, for example, Cochran 
1940). Suppose the observed proportions in the various classes de- 
termined by the factorial combinations are p, and the expected pro- 
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portions are P; . In the 7-th class the probability that the proportion 
is x,/n; , where n, is the number of units in the class, is 


= i)! 


(3) 


The maximum likelihood estimates of the s parameters 6; , where s is 
less than the total number of classes, must satisfy the s equations 


(4) 

Now let z; be the transformed values obtained from the observed 
p; . If 0; are first approximations to 6; and Z{ and Ri (=2P{ — 1) 
are determined from these 6} using equations (1) and (2), then second 
approximations to the maximum likelihood estimates are given by the 
6; of the s equations 


where j,k = ltos 


and wi = 1— RP? = (6) 
The quantity 
(7) 
+ 


in which r; is equal to 2p; — 1, is described as the working z and has 
maximal and minimal values of 


1 


1-R; 


1 
and 
for p; = 1 and p; = O respectively. The above expressions are the same 
as those given by Fisher and Yates (1948). 

If P; are the proportions calculated from the final fitted 6; then, 
unless the expected frequencies are small 


is distributed as x” with degrees of freedom depending on the number 
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of parameters which have been estimated. A significant value of x6 
may arise through heterogeneity among the sampled units or inadequate 
representation of the true situation by the model defined in (2). For 
example if only main effects were estimated, a significant value would 
suggest, apart from the possibility of heterogeneity mentioned above, 
the existence of some interactions. 

Variances and covariances of the estimates can usually be taken as 
equal to the elements of the inverse matrix required in the solution of 
equations (5). Standard errors can be calculated when these have been 
found and the estimates of 6; tested in the normal distribution. If, how- 
ever, there is evidence of heterogeneity it is common practice to multiply 
these standard errors by the square root of x5/f, where x5 is estimated 
with f degrees of freedom. Tests of significance are then made in the 
t-distribution with f degrees of freedom. 


6. Consider in particular the case of a 2” factorial arrangement. It 
is convenient to let 6, , say, represent the mean effect and 26, , --- 26, 
the main effects and interactions of the factors. If this is done the l;, 
of equations (2) are all either +1 or —1. For example if there are two 
factors A, B (the presence of these factors being denoted by a, 6) and 
20, , 20; , 20, , are respectively the main effects of A and B and the 
interaction AB then the Z for the various combinations are: 


a 6, + 6. — 0, — % 
b 0, — 6. + 03; — O4 
ab 6, + + 6 + 
(1) — — & 


Similar expressions can be worked out for more complicated cases (see. 
for example, Yates, 1937). 

Usually it is possible to assume that some of the interactions are 
negligible. The corresponding @ are then given the value zero. First 
approximations to the 6; should be obtained from the observed Z; using 
the methods given by Yates (1937) ignoring the different variances of 
the z; . The Z/ are then caiculated from these approximations using 
equations (2) and hence the working z (equations (7)). 

If the factors are denoted by a, b, c --- , the maximum likelihood 
estimates of mean and main effects by nu, 2a, 28, 2y --- and the maximum 
likelihood estimates of interactions by 27 (a8), 27 (av) --- then equations 
(5) may be written 
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=|. (9) 


The suffixes nw’ and nw’z’ attached to the matrix and right hand 
vector refer to all the elements of the matrix and vector. 

The , (AB), ete. and the , ete. are caleu- 
lated from the nw’ and nw’z’ in the same way that the mean effect, 
main effects and interactions are calculated from yield data but without 
the divisor 2” for the mean effect or 2””' for other effects. Thus G,,- , 
Gaw's: are, respectively, the totals of all values nw’ and nw’z’. Similarly 
A,. and A,,.-, are the total differences between values with and 
without factor a, and so on. For example in a 2* arrangement (AB),,- 
is the difference between the sum of the nw’ for the factorial combina- 
tions 


abed, abc, abd, ab, cd, c, d, (1) 
and the sum of the nw’ for 
acd, ac, ad, a, bed, be, bd, b. 


The matrix relationship (9) represents a set of linear equations of which 
the first is: 


UG + + BB,..: + t(aB)(AB),w: + 


The symbols of the right hand vector of (9) correspond to the 
estimates in the first vector. The symbols of the matrix are obtained 
by multiplying a row vector and a column vector each consisting of 
the symbols 1, A, B, --- AB in the appropriate order, and writing 
A’ = 1, B’ = 1 ete. and replacing 1 by G. 
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7. The method just described has been applied to data taken from 
the paper by Lombard and Doering (1947). The 1729 individuals were 
classified according to the presence or absence of four factors. These 
factors, as described by Lombard and Doering, and our symbols, are 


Main effects 
Maximum 
First jikelihood 
approximation estimte 


Each individual was allotted either a good score or a poor score on 
cancer knowledge. Here it is proposed to estimate the effects of the 
recorded factors on the observed proportions achieving good scores. 

The various factorial combinations are listed in column (1) of 
Table I. The total numbers in each class are given in column (2) and 
the number with good scores in column (3). The values r of coluinn (4), 
are found directly from the proportions as 2p — 1. It is fairly obvious 
from a cursory examination of these r that some effects do exist. Thus 
the r in six of the classes involving newspapers are positive whilst the 
r in six of the classes not involving this factor are negative. 


First estimates (from column (5)) yw’ = —.16 (mean) 

a’ = .25 (newspapers) 

B’ = _ .12 (radio) 

= .14 (solid reading) 

6’ = .09 (lectures) 
Second estimates (from solution of equations (9)) 

p= —.14 

a= .16 

B=  .08 

y= .25 

$= 


The transformed values z are given in the next column. These are 
obtainable directly from Table VII of Fisher and Yates (1948). Alter- 
natively reference can be made to Finney (1947) who has tabulated 
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TABLE I. EXAMPLE. CLASSIFICATION OF INDIVIDUALS WITH GOOD MARKS AND 
COMPUTATIONS FOR MAIN EFFECTS 


a}; ® ® (10) (11) 

n pn r z R’ |1-R% 2 nw’ 
abed | 31 23 48} .52 .41 83 | .52] 25.79 13.41 
abe | 169 | 102 21; 27 | .26 93 | .22 | 157.58 34.67 
abd 12 8 .33 | .34 16| .16 97} .33] 11.69 3.86 
ab 94 35 |—.26 |—.27 |—.01 |—.01 | 1.00 |—.26 | 93.99 | —24.44 
acd 45 27 .20 .20 20 20 96 | .20| 43.20 8.64 
ac 378 | 201 .06 | .06 03} .03 | 1.00} .06 | 377.66 22.66 
ad 13 7 .08 | .08 |—.08 |—.08 99} .08| 12.92 1.03 
a 231 75 |—.35 |—.37 |—.25 |—.24 94 |—.37 | 217.69 | —80.55 
bed 4 1 |—.50 |—.55 |—.06 |—.06 | 1.00 |—.50 3.99 | —1.99 
be 32 16 .00 00 |—.24 |—.24 94 -01 | 30.16 .30 
bd 7 4 14 14 |—.34 |—.33 89} .19 6.24 1.19 
b 63 13 |—.59 |—.68 |—.51 |—.47 78 |}—.66 | 49.08 | —32.40 
cd 11 3 |—.45 |—.48 |—.30 |—.29 92 |—.47 | 10.07 | —4.74 
ce 150 67 |—.11 |—.11 |—.48 |—.45 80 |—.05 | 119.63 —5.98 
d 12 2 |—.67 |—.81 |—.58 |—.52 73 |—.79 8.76 —6.92 
(1) | 477 84 |—.65 |—.78 |—.75 |—.64 59 |—.77 | 281.62 |—216.85 


logits, equal to 5 + z, for given p and in addition has given tables for 
the maximal and minimal working logits and weighting coefficients. 

First approximations y’, a’, 6’, y’, 5’, are estimated from the z and 
given in Table I. For example a’ is half of the mean difference between 
the z’s involving a and those not involving a. The Z’ are calculated 
from these approximations (column (5)). For example 


Z' (abd) = —.16 + .25 + .12 — .14 + .09 = .16 
The next step involves finding the working z and weights corresponding 
to the Z’. Use is made of the formulae (6) and (7). Thus for the 
combination abcd, R’ is calculated from Z’ using the table given by 
Fisher and Yates. In column (9) z’ is given by 


48 — .41 
44+ 


whilst nw’ in column (10) is 31 X .83. These values can also be ob- 


tained from Finney’s tables remembering that working logits are 5 larger 
than the working z. 


Second approximations yz, a, 8, y, 5, are obtained from the solution 
of the equations 
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[1450.06 430.98 —693.04 86.08 —1204.75][u] [ 288.10] 
1450.06 31.81 449.78 —301.89|\a 246.67 

1450.06 26.93  638.55||8| = | 277.29 

1450.06 - 0.82||y 422.04 

L 1450.06/.5) 317.06! 


These correspond to equations (9). For example (AB), is 
25.79 + 157.58 + --- — 43.20 — 377.66 --- — 3.99 --- 
+ 10.07 --- + 281.62 = —31.81 


The inverse of the matrix, which is symmetrical, was found by the 
square root method of Dwyer (1945): 


| 0.00250888 —0.00033639  0.00037913 —0.00005268 0.00184749 | 
0.00085142 —0.00011494 —0.00024196 —0.00005147 
0.00091827 —0.00000385 —0.0001133i 

0.00076793 —0.00009288 


L 0.00226381 - 
and the vector of estimates is therefore 


— 137124) 
156626 
079502 
.249074 
1021844 


Values of Z, R were calculated from these estimates just as the 
Z’, R’ were obtained from the first approximations and are shown in 
Table II. The value of x6 is 13.85 and has 11 degrees of freedom i.e. 
16 minus one degree of freedom for every parameter estimated. The 
x2 is little above expectation. The elements of the inverse matrix are 
used as estimates of the covariances and variances of the estimates of 
the parameters. We have 


w= —-0.14+ 050 6 = 0.08 + .030 
0.16+ 029 y= 0.25 + .028 
6 = 0.10 + .048 


R 
| 
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TABLE II. EXAMPLE. COMPUTATIONS FOR THE COMPARISON OF OBSERVED AND 
EXPECTED PROPORTIONS 


n(r — R)? 
Treatment Z R 1-F 
abed 0.45 .42 .14 
abc 0.25 
abd —0.05 —.05 1.74 
ab —0.25 —.24 .04 
acd 0.29 .28 31 
ac 0.09 .09 .34 
ad —0.21 —.21 1.14 
a —0.41 — .39 
bed 0.14 .13 1.61 
be —0.07 —.07 16 
bd —0.36 —.35 1.92 
b —0.57 — .52 42 
ed —0.02 — .03 1.94 
c —0.23 —.23 2.28 
d —0.52 —.49 51 
(1) —0.72 — .62 70 


= 13.85 (11 df.) 


The estimate of the effect due to lectures is rather inaccurate. This is 
not surprising as a relatively small number of individuals attended 
lectures. A further cycle of the above operations using the values of 
u, a, B, y, 6, as starting points yielded new estimates which were identical 
to the second decimal place. Usually a single cycle is sufficient for 
practical purposes. 

lt is important to realise that with this type of data there are likely 
to be a number of factors which may influence our estimate of the 
effect of say, solid reading but which have not been taken into account. 
The point does not arise in the case of well conducted experiments but 
is common in survey work. For a discussion of the critical analysis of 
survey data reference may be made to Yates (1949). 


8. That the above analysis provides a satisfactory fit is indicated by 
the value of x; . In other cases it might be necessary to estimate 
parameters representing interactions of the various factors. The esti- 
mates of main effects are, in general, different depending on which 
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interactions are considered. Thus, if interactions of the other factors 
with lectures represented by i(aé), 7(85), i(y5) are also estimated we find 


p=-14 a= 24 B= .12 = 16 
+ .058° + .057 + .049 + 052 
5=.10 = .10 = 06 = —.11 
+ .058 + .057 + .049 + .052 


and x; is 5.78 with 8 degrees of freedom. 

Whilst this representation gives a slightly better fit it should be 
noted that the estimates of main effects are now considerably less 
accurate. There is little point in sacrificing information on main effects 
in order to eliminate disturbances due to a few small interactions. It 
should be remembered that there is a limit to the amount of useful 
information which can be extracted from a single body of data. 


9. One method used by Lombard and Doering involved the calcula- 
tion of partial correlation coefficients between good scores and the 
various factors. These coefficients, transformed to z (Fisher and Yates 
1948, Table VII) and divided by the standard error of the z, o, , are here 
compared with the ratios of the estimated main effects to their standard 
errors. 


z Main effect 
Newspar rs 5.7 5.5 
Solid reading 9.4 8.9 
Lectures 2.1 2.1 


There is a close similarity between the two sets of figures. The present 
method, however, gives estimates of the magnitude of the effects of 
the factors and these can be expressed as proportions as in the following 
table. None of the methods given by Lombard and Doering can lead to 
similar tables. 

These figures are simply obtained by transforming » + a, u — a, etc. 
back to the original scale. Standard errors and tests of significance 
should be confined to the estimates on the transformed scale. 
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EXPECTED PROPORTIONS OF GOOD SCORES 


Factor Presence Absence 


47 

48 


SUMMARY 


Methods of analysis of results from factorial arrangements are well 
known when the data are, for example, yields. In this paper a method, 
applicable whether the numbers of observations are equal or, as is more 
usual in survey work, unequal, is given for the case in which the data 
are proportions. The data are subjected to a transformation (here the 
logit transformation is suggested) such that in the new scale of meas- 
urement observations can reasonably be represented as linear functions 
of a number of parameters. Maximum likelihood estimates of these 
parameters are then found. 
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THE ESTIMATION OF RESPONSE-TIME DISTRIBUTIONS 


I. FUNDAMENTAL CONCEPTS AND GENERAL METHODS 
M. R. Samprorp 


Lectureship in the Design and Analysis of Scientific Experiment, 
University of Oxford 


1. INTRODUCTION 


1.1. Response-time trials and their objects 


Ww the members of a group of experimental animals are sub- 
jected to a treatment, their reactions (if any) do not take place 
bicieetiiciaie For some treatments the delay may be so short that 
the reaction appears instantaneous, but for many there is a measurable 
interval between treatment and reaction. This interval, called the 
response-time, may be only a fraction of a second, a few minutes, cr 
several years, depending on the nature of the treatment. For example, 
@ mouse receiving a lethal dose of strychnine seldom lives more than 
an hour, whereas a rabbit subjected to continuous gamma-radiaticn 
may survive for several years. The response-time may be considered 
as a random variable, which, for individual members of the populaticn 
from which the treated group is a sample, has a distribution of known 
or unknown form. If the individuals in this group are selected <t 
random from the population, their observed response-times form a ran- 
dom sample which may be used to estimate the distribution of response- 
times in the population. 
Response-times may be studied experimentally, 


(i) to estimate a relationship between time and proportionate re- 
sponse, which may be used to predict the time required for a given 
proportion of individuals to respond, or the proportionate a 
expected at a given time; 

(ii) as a convenient quantitative measure of toxicity, or other ac- 
tivity, in biological assay; and 

(iii) in investigating the nature of the reaction to a treatment or 
combination of treatments; for example, in attempting to determine 
whether animals which survive the effects of a poison have recovered 
or are truly immune, or in examining the joint action of two poisons. 
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The questions asked in experiments of type (i) may usually be answered 
by ‘non-parametric’ methods, using, for example, the proportion killed 
in the sample at a given time as an estimator of the expected proportion 
killed in the population. Such methods, however, are inadequate for 
types (ii) and (iii), and for these a mathematical model must be con- 
structed as a basis for calculation. When, as frequently happens in 
short-term experiments lasting only a few hours, exact response-times 
can be observed for all subjects, the analysis is straightforward: in 
long-term experiments, however, lasting for many weeks or months, 
various factors may operate to prevent the collection of complete data, 
and so to complicate the analysis. These complications are discussed 
in more detail in the following section. 


1.2. The response-time distribution 


With each individual of a (supposedly homogeneous) sample there 
is associated a certain initial event, determined in time, and a reaction, 
which may occur at any time after the initial event or not at all. For 
each individual showing the reaction, the response-time (denoted by 1) 
is defined as the time elapsing between the initial event and the re- 
action. Table I shows the initial event and the reaction for some typical 
experiments. 


TABLE I 
INITIAL EVENTS AND REACTIONS FOR SOME CHARACTERISTIC EXPERIMENTS 


Initial event Reaction 
Application of toxic agent Death of subject 
Onset of artificially produced coma Recovery from coma 


Appearance of artificially induced or natural | ‘Regression’ of tumour 
tumour 


(In this case the tumour itself is regarded as the ‘individual’) 
Discharge of patient after cancer therapy Death, from cancer or otherwise 
Start of observation Death 
(for control series) 


It will be seen that, although the initial event may be the immediate 
cause of the reaction, this is by no means always so. The initial event 
may be a primary response to some external stimulus that produces the 
reaction as a secondary response, or even (as in the case of the control 
series) a moment arbitrarily determined in time. 

It is convenient to postulate a population of individuals, each of 
which has a potential response-time to any treatment, and such that 
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the treated individuals form a random sample which can, in theory, 
be used to estimate the distribution of response-time in the population. 
Estimation is complicated, however, by the fact that a potential re- 
sponse-time can only be realised by applying the treatment and ob- 
serving the reaction. If, for some reason, the reaction cannot be ob- 
served for any individual, no response-time can be recorded, and the 
sample data will then be incomplete. This complication, which is largely 
peculiar to response-time studies (although analogous phenomena oc- 
easionally occur in other connections), is responsible for most of the 
difficulties inherent in the analysis of response-time data. 

There are several reasons why an individual may not show the re- 
action, and they give different mathematical models. 

For any experiment, short-term or long-term, the data may be 
truncated, so that exact response-times are known only over part of the 
range, the only information available for the other parts being the total 
numbers responding. This is particularly important in Jong-term ex- 
periments. The exigencies of a research programme may require that 
an experiment be concluded at a certain time, when a proportion of 
the animals have yet to respond; all that can then be said about these 
animals is that their response-times lie -in an upper tail of the distribu- 
tion. Even if the experiment is allowed to continue, it is frequently 
desirable to make an interim calculation before all the animals have 
responded; from the point of view of the analysis, the situations are 
identical. 

_Some individuals may fail to respond even though observation is 
continued long after the last observed reaction; for example, an animal 
may recover from, or be immune to, the action of a poison. This dis- 
tinction between immunity and recovery, especially important when the 
reaction is death, may be extended to reactions of other types, dis- 
tinguishing between those individuals in which the stimulus has no 
effect and those in which an effect occurs without reaching the threshold 
level required to produce the reaction. The words ‘survival’, ‘immunity’, 
and ‘recovery’ are convenient for describing this phenomenon and its two 
sub-types, even though the reaction under investigation is not always 
death. 

Finally, the study of reactions to a particular stimulus may be 
complicated by the presence of one or more ‘nuisance’ stimuli, each 
with a potential reaction, which may be identical with that to the 
primary stimulus or may merely preclude its occurrence. The most 
important case is that in which the secondary reaction is the death of 
the individual (as a result of accident,. disease, or even, in long-term 
experiments, natural causes) or its removal from the experiment (which 
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16 BIOMETRICS, MARCH 1952 
may be considered as death for the purposes of the analysis). For 
example, in the study of tumour regression, an animal may be killed 
while a tumour is still present, and the tumour is then considered to 
‘die’ accidentally. Again, if the expected reaction is also death, de- 
termination of the cause of death may be impossible, so that the ob- 
served response-times form a sample from a distribution produced by 
the joint action of two or more simple distributions. A distinction will 
- be drawn between ‘single-stimulus’ distributions (with or without 100% 
response) and ‘multi-stimulus’ distributions (in which the reactions pro- 
duced by the separate stimuli may or may not be distinguishable). 


1.3. Historical survey 


Bliss and Stevens (1937) discussed the estimation of normal re- 
sponse-time distributions in the absence of complications, and consid- 
ered the problem of truncation, giving a maximum likelihood method 
for the analysis of truncated data. They also discussed the problem of 
survival, assuming that the animals surviving recovered from the treat- 
ment, and showed how such data might be analysed by an ‘artificial 
truncation’ method. Further work on truncated samples has been pub- 
lished by Ipsen (1949) and, without particular reference to response- 
time studies, by Hald (1949) and Cohen (1950). — 

Withell (1942) and Irwin (1942) have discussed possible distribution 
functions for survival times, with particular stress on the normal and 
exponential forms, and Box and Cullumbine (1947) have published 
empirical investigations into the most suitable choice of time metameter. 

Walsh (1950) has considered the use of ‘non-parametric’ methods in 
dealing with truncated distributions, with reference to industrial break- 
Cown-time tria]s, and Irwin and Goodman (1946) have used an actuarial 
method to analyse data on the appearance of tumours in mice. 

Boag (1949) applied maximum likelihood methods to a complicated 
series of data, obtained from hospital records of cancer patients, in 
which it was possible to distinguish two causes of death and three 
classes of survivors at any time. 


1.4. General remarks 


The present paper, the first of three, is an introduction which pro- 
vides a statement of the problems involved in the analysis of response- 
time data, restates various relevant results developed in other connec- 
tions, and discusses the analysis of the simple, single-stimulus, distri- 
bution in the absence of truncation or survival. Subsequent papers will 
deal with multi-stimulus distributions, and with the analysis of data 
complicated by truncation and survival. - 
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The problems and methods discussed are stated and developed 
almost entirely in terms of experimental animals. They are equally 
applicable, however, to portions of animal tissue, plants, and even to 
inanimate objects such as electric light bulbs or radio components under 
test conditions. They may apply to human populations, as in the 
analysis of hospital records, although these usually present their own 
peculiar features, necessitating the construction of special mathematical 
models such as that of Boag already mentioned. 

Again, although the methods are applicable to reactions of any type, 
it is convenient, for simplicity of terminology, to assume the reaction 
to be death, and to speak of those animals which, at any particular time, 
have or have not shown the reaction as ‘killed’ and ‘survivors’ re- 
spectively. 


1.5. A note on the choice of parameters 


Most of the methods developed refer to the normal distribution, 
and the usual custom of defining the distribution in terms of the param- 
eters » and o, the mean and standard deviation, might seem desirable. 
However, although the mean and variance of a time distribution are 
sometimes of interest for their own sakes, the purpose of the analysis 
is, more usually, to compare two or more distributions, or to calculate 
the expected proportion of the population that will exhibit the reaction 
before some specified time. This proportion is calculated from 


= —" = a+ Bt, 

which is most conveniently calculated in terms of a and 8, while two 

distributions may be as conveniently compared by means of these. 
parameters as by » and co. For this reason, and also for the sake of a ~ 
certain simplification in the calculations, the methods derived are 
usually directed to the estimation of a and f: » and o have, however, 
been retained wherever the use of a and 8 would result in noticeably 
more cumbrous expressions. 


2. THE NORMAL RESPONSE-TIME DISTRIBUTION 


2.1. The response-time curve 

It is assumed in this section that an experiment has been carried 
through to completion: all animals have responded to the one stimulus, 
and the response-time 7 of each has been measured. If, then, a number 
of different values of 7 are chosen, and against each is plotted the pro- 
portion of the sample having a response-time less than that value, a 
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_ sigmoid curve will be obtained. This sigmoid sometimes appears to be 

normal, but a better visual indication of normality is obtained by trans- 
forming the proportions to normal equivalent deviates (N.E.D.’s), 
when a straight line should result. A normal distribution in 7 can only 
be an approximation, since the true distribution of times has a finite 
lower limit, but, if a considerable time has elapsed before the occurrence 
of the earliest response, the approximation may prove adequate for 
practical purposes. More usually, some transformation of the time- 
scale is required to straighten the N.E.D. line. As in dose-response 
work, the transformation 

t = logr 


often produces satisfactory results (Bliss & Stevens; 1937). In general, 
any time metameter (transformed or not) that produces a straight 
N.E.D. line will be denoted by ¢. (However, when the observed dis- 
tribution is produced from a ‘parent’ distribution by the interference of 
a secondary stimulus, ¢ will be used for the time-metameter that con- 
verts the ‘parent’ distribution to the normal form; the observed distri- 
bution will not, in general, be normal in 2). 


2.2. Fitting the normal distribution 


Though the N.E.D. line may legitimately be used as a visual indi- 
cation of goodness of fit, it has no further use in the analysis (save, 
possibly, to give provisional estimates of parameters, where required). 
In particular, the methods of probit analysis, which have at various 
times been suggested and used for the treatment of data of this type, 
are quite inapplicable. Once the appropriate response-time-metameter 
(R.T.M.) has been found, the data should be regarded simply as a 
random sample from a population normal in the R.T.M. ¢, and calcula- 
tions should be performed as for any other random sample from a 
normal population. The distribution is fitted by estimating » and o, 
the mean and standard deviation, in the usual way, Sheppard’s correc- 
tions being used where the data are grouped (but see section 3). Ifa 
numerical test of goodness of fit is required, a x’ test on the N.E.D. 
line of the type applied in fitting tolerance distributions is invalid, and 
the test should be performed as for any other normal distribution, by 
considering the values of the third and fourth sample moments or by a 
x’ test on frequencies in various parts of the range of the R.T.M. 

All the usual techniques for the treatment of normal samples may 
be used in the analysis of data of this type. In particular, if a number 
of different stimuli give distributions normal in the same R.T.M., and 
having the same variance, the distribution means may be compared by | 
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the ¢ test or by the methods of the analysis of variance. When the 
various stimuli are the applications of different doses of a poison, the 
mean response-time is sometimes linearly related to some simple dose- 
metameter, such as log dose: a linear regression of R.T.M. on the dose- 
metameter may then be calculated (Box & Cullumbine, 1947). Such 
regressions may be of considerable value in biological assay. For ex- 
ample, if A and B are, respectively, standard and trial preparations of 
a poison, and parallel regression lines 


ta = Ga + 
ts = + 
can be fitted, where z is log dose, then 


Maz — 2, = 


where 2’ is the log dose required to produce any given mean R.T.M., 
and m,z is the logarithm of the relative potency. The use of survival- 
time as a quantitative response in biological assay has been recom- 
mended by Perry (1950), and is in some ways superior, given appropriate 
conditions, to the standard method based on quantal responses. The 
application of response-times to biological assay will be discussed in 
more detail in the third paper of this series. 

Litchfield (1949) has described a graphical method for the analysis 
of straightforward and truncated normal samples. While this method 
may be useful for routine analyses, and to the experimenter who has 
no access to a calculating machine, its use is hardly to be recommended 
to research workers, particularly as the rigorous analysis, even of a 
truncated sample, is unlikely to take more than a fairly small proportion 
of the time taken to set up and carry through the experiment. 


3. THE TREATMENT OF GROUPED DATA 


3.1. Types of grouping 


If all reactions occur shortly after the initial stimulus, so that con- 
tinuous observation is possible throughout the experiment, individual 
response-times may be recorded. These will, of course, be ‘grouped’, 
in the sense of being measured, say, to the nearest minute, but this is a 
limitation shared by all observables; for all practical purposes the data 
may be considered ‘continuous’. Many experiments, however, last for 
days, weeks, or even years, so that the experimenter can only make 
periodic inspections of his material, usually at regular intervals, recording 
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the number of individuals that have responded since his previous in- 
spection. The problem of estimation from grouped data is thus partic- 
ularly important in response-time analysis. A special maximum likeli- 
hood procedure making allowance for the grouping can, of course, 
always be constructed; such a method is considered in section 3.7 of 
this paper. However, these methods are usually somewhat unwieldy, 
and in certain special cases simpler methods are available. Even so, 
grouping tends to complicate the analysis, and if too severely carried 
out may sacrifice an appreciable quantity of information. Every effort 
should be made to keep the grouping intervals smaller than the supposed 
standard deviation of the distribution: ideally they should not exceed 
half this quantity. 
Three types of grouping must be considered: 


(i) ‘Regular’ grouping, with observations made at equal intervals 
on the scale of the time-metameter, over a range covering all observed 
response-times. This type of grouping occurs most frequently when 
observations are made at regular intervals from the initial event until 
all individuals have responded, and the subsequent calculations are 
carried out in the original time-scale. It may also occur when observa- 
tions are made on the original scale in such a way as to transform to a 
regular grouping in the scale of some time-metameter, or when a large 
series of data is grouped for easy handling. It is convenient to consider 
as ‘regular’ data truncated at one or both ends, but regularly grouped 
over the rest of the observed range. 

(ii) “Transformed regular’* grouping, occurring when a grouping 
‘regular’ (as defined above) in the original time-scale is transformed to 
the scale of a time-metameter. 

(iii) ‘Irregular’ grouping, covering all cases not included in (i) and 
(ii). 


3.2. Regular grouping: estimation by moments 

This is, of course, the classical grouping problem, and the solution 
is well-known. The frequency in each group is considered as concen- 
trated at a ‘class-mark’ at the centre of the group, and the required 
moments are calculated from the class-marks, adjusted by Sheppard’s 
corrections for grouping, and equated to their theoretical values, giving 
equations which may be solved for the parameters. 


3.3. Regular grouping: estimation by maximum likelihood 
If the form of the distribution is such that estimation by moments 
is impossible or undesirable, (for example, in the estimation of a multi- 
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stimulus distribution, or of a single-stimulus distribution complicated 
by ‘survival’ or truncation), two alternatives are available: grouping 
may be ignored and a maximum likelihood method used as though ob- 
servations on a continuous scale were concentrated at the class-murks, 
or a special maximum likelihood method for grouped data may be 
constructed. The second method is more justifiable but, almost always, 
more laborious. On a purely empirical basis, it seems reasonable to 
suggest that the first method may reasonably be used if the grouping 
interval is less than the standard deviation of the distribution, or, 
where the distribution is a compound produced by the simultaneous 
operation of two stimuli, less than the smaller of the two standard 
deviations of the separate distributions. Elsewhere the method allowing 
for grouping should be used. 


3.4. Transformed regular grouping: estimation by moments 


Data of this type may be handled by a method due to Hartley 
(1950). Suppose that the time-metameter ¢ is defined by the relation 


0O<r<o 
t = 


and that the p.d.f. of tis f(t). The 7-scale is divided into equal intervals 
of length h, with end-points 7) = 0, 7, , --* 7,-1, 7 = rh = 8, where 
B is chosen at will, but is so large that only a negligible approximation 
is involved in writing, for the j-th moment of the distribution f(é), 


where b = g(8). Then the moments of f(t) are given by 


bi i-1 , 
jj. dr (2) 


where 


a(r) 
Pa) = dt. 


The value of the integrand in (2) is known at all the interval end- 
points, P(r;) being the observed proportion killed at time 7; , and since 
the grouping in the r-scale is regular, the integral may be evaluated by 
one of the standard finite-difference integration formulae, such as the 
Gauss or Gregory-Newton formula. Considerable care is needed in 
applying these formulae; for details the reader is referred to Milne- 
Thomson (1933) or Comrie (1950, Table Xp). It is assumed in the 
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above that the function g(r) is monotonically increasing; should a 
monotonically decreasing function (for example, the reciprocal) be 
chosen, the value 6 will appear as the lower limit of the integral in (1), 


and the sign of the second member of the right-hand side of (2) will 
be changed. 


3.5. Transformed regular grouping: estimation by maximum likelihood 


As for regular grouping, the use of a class-mark in a ‘continuous’ 
maximum likelihood process is likely to be much less laborious than the 
construction of a method allowing for grouping. There is here no single 
grouping interval, but it seems not unreasonable to suggest that, if the 
average length of the grouping interval in the neighbourhood of the 
mean is less than the standard deviation, grouping may be ignored; 
otherwise a special method should be used. . 

Empirical investigations with a logarithmic grouping suggest that, 
when grouping is ignored, the use of the logarithm of the mid-point of 
the r-interval as class-mark produces a smaller bias than the mid-point 
of the ¢-interval: thus the class-mark for the interval t; — ¢;,, should be 
chosen as log [(7; + 1;4:)/2], rather than as } (log 7, + log 7;,,). 

The function g(r) is usually such that the interval 0 — 7, transforms 
to an infinite tail in the ¢-scale. Even though the remainder of the 
grouping is sufficiently fine to’ be ignored, it would seem unwise to 
attach a class-mark to any observations falling in this tail; instead, the 
data should be considered as truncated at the point ¢,’on the ¢-scale 


corresponding to 7, , and analysed by one of the usual methods for 
truncated data. 


3.6. Irregular grouping 


For irregular grouping, no methods comparable with the method of 
3.4 or the application of Sheppard’s corrections are available. It is 
only possible to lay down some such empirical rule as those of sections 
3.3 and 3.5, suggesting that, provided all grouping intervals in the 
observed range are less, say, than the standard deviation, grouping may 
be ignored and the data treated as a ‘continuous’ sample (possibly 


truncated), and that otherwise a special maximum likelihood solution 
is required. 


3.7. Maximum likelihood methods for grouped data 


Methods specially constructed for the analysis of grouped data do 
not in general permit of any computational simplification of the basic 
maximum likelihood routine, and a detailed account of the method ap- 
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plied to the estimation of the parameters of a normal distribution should 


be sufficient to indicate the general argument. 
The distribution of the metameter ¢ is given by 


f() dt = exp exp 5 (a an} dt 


The data consist of records of the numbers of individuals ny , , , --- 
n, (>, n; = n) responding in the periods < 
The probability that a single observation, chosen at random, lies in 
the range t; < t < t;,, is x; , where 


B bits 1 
1 Mite 
exp {— = Que 1,2,---,r—D), 


= {- an = @ 


= 


mn pt; 

2 
Z; Z(n:) = exp + 
and 

Ww 
Then the likelihood 
La 

and 


log L =k + >> n, log x, 
0 
The maximum likelihood equations take the form 
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These equations cannot, in general, be solved to yield explicit expres- 
sions for a and 8, so the usual maximum likelihood iterative method 
must be adopted (cf. Finney, 1949; 1952, Appendix II). Calculating 
second order differential coefficients of log Z, and replacing the observed 
n,; by their expected values, 


2 gat 
2 2 


Then, if approximate estimates a, , b, of a, 8 are available, corrections 
to these estimates are obtained us solutions of the equations 


nA 6a, + nB 6b, n; 


dn; (teas Zeer 


where A, B, C, and the right-hand members of the equations are calcu- 
lated in terms of a, and b, . ‘This procedure is repeated until stable 
values of a and b are obiained, when the inverse of the matrix of co- 
efficients used in the last cycle provides an asymptotic estimate of the 
variance-covariance matrix of a and b. 

The labour involved in calculating the quantities A, B, and C is 
not inconsiderable, but as this method will only need to be applied to 
data with fairly coarse grouping, the number of groups will usually not 
be large. 


3.8. Example 


The data of Table II, taken from a paper by Lorenz eé al. (1947) 
and coarsely grouped for the purposes of the example, consist of numbers 
of mice, in a control series, dying in successive periods (1 unit = 2 
months). The survival-times themselves appear to be normally dis- 
tributed, and this is confirmed by additionai data from the same source. 
In fact, of course, regularly grouped data such as these would be analysed 
by the usual method, applying Sheppard’s corrections, but here they 
are analysed by the method of section 3.7, in order to provide a com- 
parison between the results obtained by this method and by the use 
of Sheppard’s corrections. 


nB ba, + nC 6b, 
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TABLE II 
DEATHS OF MICE IN CONTROL SERIES (LORENZ et al. 1947) 
Period Number of deaths Class mark (t’) 
0.5- 3.5 2 2 
3.5- 6.5 5 5 
6.5- 9.5 17 8 
9.5-12.5 18 11 
12.5-15.5 14 14 
15.5-18.5 2 17 


Provisional estimates are obtained from the uncorrected mean and 
variance (the class mark ¢’ in Table II is the mid-point of the period). 


n= 58 
= 593 
> = 6721 


m = 10.22, s = 3.40 
giving estimates 
aq, = —3.01, b; = 0.294. 


a: , Z; and Q; are now tabulated, with the required differences. The 
sums are most easily calculated by tabulating 


Z; 


and summing products of these columns with those of undivided differ- 
ences. The calculations are shown in Table III. 
Equations (3) become 


54.452 6a, + 557.412 6b, 
557.412 da, + 6881.414 5b, 


The variance matrix is 


[Vv] “| 0.10752442 


and tia Zisr 


0.369 
18.387. 


—0.00870975  0.00085083 


da, 0.369 —0.1205 a, = —3.13 
5b, 18.387 0.0124 b= 0.306. 
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A further cycle gave final estimates of 
a, = —3.14 + 0.34 
bs = 0.307 + 0.030. 


Both a and b have been increased in absolute magnitude, compared 
with the provisional estimates; these increases correspond to the reduced 
value of the variance obtained by allowing for the effect of grouping. 
In this example the results of the maximum likelihood analysis can be 
compared with the results of applying Sheppard’s corrections. The 
interval h = 3, so that the crude sample variance must be reduced by 
0.75. This amends the provisional values to 


s= 3.29 
a= -—3.1l 
b= 0.304; 


values which differ from the maximum likelihood estimates by only 
10% of their standard errors. 

4. ESTIMATION OF FUNCTIONS OF THE PARAMETERS 
4.1. Introduction 


__ If (a, @) or (a, 8), the parameters of a normal time-mortality dis- 
tribution, are estimated by (m, s) or (a, b), the N.E.D. n(¢) is estimated 
by the function 


t—m 
8 


y(t) = =a-+t bt. 


The relation 
exp {- (4) 


may then be used to obtain an estimate pr. of the proportion killed ai 
time T’, or of the time tp. corresponding to a proportionate kill P’. 

Expressions for standard errors and fiducial limits of pr and tp- may 
be obtained in terms of (m, s) or (a, 6). Only exact formulae for vari- 
ances are included, as fiducial limits can always be calculated from 
formulae that, if not exact, involve only minor and fairly easily justi- 
fiable approximation. In other cases asymptotically valid formulae for 
variances can be derived if required, but their use is not recommended, 
particularly as many of the functions concerned have distributions 
which are very far from normal. 
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4.2. Estimates from (a, b) 

If a sample is complicated by accidental deaths or by truncation, 
the maximum likelihood method is used, leading to estimates (a, b), 
or, more usually, (a’, b), where 


y(t) = a’ + W(t — 9), 


a’ and b are uncorrelated, and ¢ is some weighted mean of the observa- 
tions. Estimated variances V(a’) and V(b) are obtained from the last 
cycle of maximum likelihood calculations, and the distributions of a’ 
and b may be assumed approximately normal. 


Then 
yr =a’ + — 
Viyr-) = V(a’) + (T’ — 8° V(b) (5) 
and fiducial limits to 77. may be calculated as 
yr + (t X SE. (yr) (6) 


where t is the normal deviate corresponding to the required probability 
level. A point estimate and fiducial limits of pz. are obtained by trans- 
forming, by (4), the corresponding values for yr: . 
The time tp. required to reach an expected proportionate kill P’ is 
given by 
np: = a! + — 2) 
or 


where np: is the N.E.D. corresponding to the given proportion P’. 

Fiducial limits for tp» , exact apart from the assumptions of nor- 
mality for a’ and b and the use of their asymptotic variances are (Fieller, 
1940) 


ter + (tp: — 
(7) 


t = 
+ Vil — g)V(a’) + (tp — 
where 


=O, 
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results which may be compared with those for the LD 50 in probit 
analysis (Finney, 1952, §19). Fiducial limits for rp. are obtained by 
transforming those for fp. . 


4.3. Estimation from. (m, s) 


If the data consist of a simple normal sample, it is, of course, possible 
to estimate a and 8, and use the formulae of §4.2. However, not only 
is the calculation of m and s’, the unbiassed estimates of » and o’, more 
convenient, but it has the advantage that the sampling distributions of 
m and s are known exactly, so that exact fiducial limits for yr. and tp. , 
and hence for pr: and rp. , may be calculated by using the properties 
of the non-central ¢-distribution (Johnson & Welch; 1940). 


4.4. Multi-stimulus distributions 


If the sample is complicated by survival or natural deaths a maxi- 
mum likelihood method may be used to give estimates of (u, o) or 
(a, 8), the parameters of the hypothetical normal distribution resulting 
from the treatment alone. The methods of sections 4.2 and 4.3 can 
then be used, if required, to make predictions about a population in 
which that distribution is unmodified by survival or natural death. 
However, such predictions are likely to be useless, as the factors causing 
these modifications will presumably operate in the population as well 
as in the sample: if the predictions are to be of any practical value, they 
should be based on the modified distribution. Appropriate formulae 
for distributions complicated by natural deaths will be given: in the 
second paper, and by survival in the third paper of this series. 


5. NON-PARAMETRIC METHODS 


Response-time experiments are frequently required to estimate one 
or more of the quantiles of the distribution function. If the form of 
the distribution is suitable they may be calculated from the estimated 
parameters (section 4), but direct estimation by the sample quantiles 
is sometimes more convenient, and this method must be used when the 
form of the distribution is unknown. It is applicable to any ‘single- 
stimulus’ response-time distribution, and also to any ‘multi-stimulus’ 
distribution whose quantiles are required. For example, there would 
be little point in estimating the quantiles of a distribution compounded 
of a treatment effect and an accidental death-rate, because the acci- 
dents occur as part of the experiment and noi as a property of the 
sample, but the quantiles of a treatment-natural death compound are 
of practical interest, as this distribution might be expected to operate 
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under other conditions than those of the experiment. Non-parametric 
methods are also valuable in the analysis of truncated data, and of 
data from a fundamentally normal distribution, modified by ‘recovery’ 
for large values of ¢, but still effectively normal in the lower part of 
the range. In particular, the use of the median as an estimator of the 
mean of a normal population involves much less work than an analysis 
by one of the maximum likelihood methods appropriate to truncated 
normal distributions. 

As an estimator, the sample quantile has the further advantage that, 
if the experiment has no other object than the estimation of the popula- 
tion quantile, observation need be continued only until the required 
proportionate response is obtained. Such a procedure is, of course, 
‘less efficient’ than the completion of the experiment, in the sense that 
less information is obtained from each individual animal: on the other 
hand, compensation for this may be effected by increasing the sample 
size, and a very considerable saving of time may result. In practice, 
the method chosen will depend on the nature and costs of the experi- 
ment. For example, break-down trials on mass-produced radio com- 
ponents are usually made to ensure that not more than a given small 
proportion fail in an initial guaranteed period, so that the experimenter 
is content with obtaining, for example, the 5% quantile. Walsh (1950) 
has shown that for such an estimation the loss of information resulting 
from the use of the sample quantile is small, so that only a proportion- 
ately small compensatory increase in sample size is needed. For the 
manufacturer, the cost of a few additional components is trivial, while 
the time saved by truncating at the 5% quantile is great. On the other 
hand, in studying the effects of a rare drug on the larger experimental 


animals, the experimenter will probably wish to determine at least the 


median, and possibly even higher quantiles. For these the loss of in- 
formation is more serious, and the cost and difficulty of enlarging the 
sample may not be offset by the saving of. time resulting from the use 
of non-parametric methods, so that the experimenter would doubtless 
prefer to continue observations until all animals have responded, thus 
obtaining all available information from the sample. 

- Methods of determining standard errors and constructing confidence 
limits for quantiles, together with other non-parametric methods of 
treating distributions of unknown form, are, in general, not peculiar to 
response-time distributions, and need not be discussed in detail. They 
are described by Cramér (1946, p. 367), Walsh (1950) (with special 
reference to response-time studies), Wolfowitz (1949, p. 93) (including 
a fairly extensive bibliography) and others. 

The methods used in actuarial statistics may be described as non- 
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parametric, inasmuch as they do not rest upon the assumption of any 
particular form of distribution, and they are, of course, specifically de- 
signed for use with response-time data of a kind. In fact, when no 
assumption can be made about the nature of the distribution of re- 
sponse-times, such methods are ‘most efficient’. There is no reason 
why these methods should not be applied to data from laboratory 
experiments, and, indeed, they have been successfully so applied by, 
among others, Irwin and Goodman (1946), and Irwin (1949). However. 
under the more uniform conditions of laboratory work, data can be 
fitted by a distribution of standard form much more frequently than in 
the field of vital statistics; when such a fit is possible, actuarial methods 
are considerably less efficient than those based on the form of the 
distribution. F 


SUMMARY 


The principal types of experiment which lead to response-time data 
are considered, and the difficulties inherent in the analysis of such 
data are discussed. The applications of standard statistical methods to 
normal response-time distributions are described. The problem of 
grouping is discussed in some detail, and various results relevant to 
this problem, and to that of attaching standard errors and fiducial 
limits to functions of estimated parameters, are restated. The condi- 
tions under which non-parametric methods may profitably be used in 
the analysis of response-times are discussed. 


I wish to express my thanks to Dr. D. J. Finney and Dr. H. O. Hartley for helpful advice during 
the preparation of this paper. 
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THE USE OF RANKS IN A TEST OF SIGNIFICANCE 
FOR COMPARING TWO TREATMENTS 


Coun WHitE 
Department of Physiology, University of Birmingham, England 


ILCOXON (3) in 1945 introduced a ranking method for determining 

the significance of the difference between two treatments. Since 
in such methods ranks 1, 2, 3 --- n are substituted for the numerical 
data, there is a certain sacrifice of information, so that Wilcoxon re- 
garded his method as giving a “rapid approximate idea of the sig- 
nificance of the differences’. 

The method was extended by Mann and Whitney (2) to deal with 
the case where the numbers of observations in the two groups are not 
the same. They calculate a statistic U and provide tables showing the 
probability of obtaining a U not larger than that tabulated, in com- 
paring two samples which contain m and n items respectively. The 
values of U tabulated cover all cases where n and m vary from 3 to 8. 
The authors prove that the limit distribution of U is normal if m and 
nm go to infinity in any arbitrary manner, and they suggest that this 
approximation may be used to obtain critical values of U when n or m 
is greater than 8. There is a simple relationship between U and the 
rank total calculated by Wilcoxon. 

In the present paper elementary methods are employed to develop 
tables for the use of the Wilcoxon procedure when the numbers in the 
two groups are not necessarily the same. No attention is given to the 
case where the observations are paired, since this has been fully treated 
by Wilcoxon. The tables can be used when there are as many as 30 
members in the two groups combined. The tabulation has been made 
as extensive as this, partly because the use of the normal approximation 
is not accurate enough when critical values of the rank sum are required 
at high levels of significance, and partly because it seems fitting to 
match the simple method of calculating the rank sum with an equally 
simple method of assessing its significance. 

As an illustration the method is applied to some physiological data. 


FREQUENCY DISTRIBUTION OF RANK TOTALS 


Let n, = number of individuals in the group of which we require the 
rank total, T. 
m, = number of individuals in the other group 


The ranks to be allotted are 1, 2, 3 --- (nm; + m,), and we require 
33 
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a means of evaluating the number of ways in which this can be done 
so that the rank total of the group in which we are interested assumes 
a particular value of 7’. 

It is clear that the lowest value the rank total can have is 


n(n, + 1) 
> 


and the highest value is 


nin + 2n. + 1) 
’ 


and that all integral values between these two are possible rank totals. 


Let W?'"’ = number of ways of obtaining the rank total 7 when 
there are n, members in the group of which we require 
the rank total and n. members in the other group. 


In each of the above ways the highest rank (n, + nz) is either used 
or not used. The cases where it is not used are equal to W7'""”. 
The cases where it is used are equal to W7.,,”;"* , since, if we must use 
this rank, we have, in effect, to find the number of ways of making a 


total of (J — n, — nz) from the remaining ranks. 


Hence 
Wr = Wren) + Wr (1) 
For example, there are 18 ways of forming a total of 23 when 5 of 
the numbers 1, 2, 3 --- 13 are summed; there are 17 ways of forming 


a total of 23 when 5 of the numbers 1, 2, 3 --- 12 are summed; and 
there is one way of forming a total of 10 when 4 of the numbers 1, 2, 
3 --- 12 are summed. Since 18 = 17 + 1, this illustrates formula (1) 
when n, = 5, m2 = 8 and T = 23. 

Since we are to work with rank totals, it is simplest to base all 
calculations on the sum of the ranks allotted to the smaller group, if 
the groups are unequal in size; otherwise we base our calculation on 
the rank total of the “first” group, and we may arbitrarily take either 
of the equal groups as the first. 

For the present work tables have been drawn up showing the number 
of ways of obtaining various rank totals, JT, when n, + n. < 30. First 
the tables were written down on inspection for the cases where n, = 
1, 2, 3 --- 30 and n, = 0.- Then the remaining tables were either 
derived by the use of formula (1) or in the simpler cases were written 
down by inspection. As explained later, in most cases the full tables 
were not necessary for the present project. 
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TABLE 1. EXTRACT FROM THE FREQUENCY DISTRIBUTION OF RANK TOTALS 
OF THE SMALLER GROUP WHEN THERE ARE 5 MEMBERS IN THIS GROUP AND 18 
MEMBERS IN THE LARGER GROUP 


Rank Total | Frequency | Cumulated 

Frequency 
15 1 1 
16 1 2 
17 2 4 
18 3 7 
19 5 12 
33 141 769 
34 163 932 
59 966 16,341 
60 967 17,308 
61 966 18,274 
101 5 33 ,642 
102 3 33 ,645 
103 2 33 ,647 
104 1 33 ,648 
105 1 33 , 649 


The resulting tables have been described as giving the number of 
ways in which various rank totals can be obtained when n, of the ranks 
1, 2,3 --+ (m, + m,) are summed. They may equally well be regarded 
as giving the frequency distributions of the rank totals obtained under 
the hypothesis that n, ranks are withdrawn, at random, from the finite 
universe 1, 2, 3 --- (n, + m2) and summed. To obtain the critical 
levels tabulated in this paper it is necessary to have available only as 
much of the frequency distributions as will include the first 23% of 
the total frequency; but in order to obtain this 23% for the largest 
distribution, with the help of the recurrence formula, it is necessary to 
list more than 23% of the smaller distributions. Table 1 gives an 
extract of the frequency distribution of rank totals obtained when there 
are 5 members in the smaller group and 18 in the larger. 
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GENERAL PROCEDURE FOR SIGNIFICANCE TEST 


First it will be noticed that if we always gave rank | to the lowest 
numerical observation, rejection of the null hypothesis would sometimes 
imply that 7 were lower than would reasonably be expected on a chance 
basis, and sometimes higher. It is an advantage to have one critical 
point rather than two for each probability level, for this simplifies the 
calculation, printing and use of the tables of critical values. It is 
therefore convenient to specify that in all cases we evaluate the results 
of a test by using either 7’, the rank total of the smaller (or first) group, 
or its conjugate, 7’, whichever is the smaller. The conjugate total is 
obtained by ranking the observations so that the highest numerical 
value is given rank 1; but it can also be calculated from the formula 
n(n, + Nm. + 1) — TF. Since the frequency distribution of rank totals 
is symmetrical, the probability of obtaining a certain value of T is the 
same as the probability of obtaining the corresponding conjugate value 
7”. Therefore in order to carry out «a two-tailed significance test at a 
probability level of say 2% we specify as a critical point that rank total 
below which 2/2 % of the rank totals in the appropriate frequency 
distribution lie. Significance at the 2% level is then established if 
either 7' or 7” lies below this critical point. 


CALCULATION OF CRITICAL POINTS 


Consider, as an illustration, the case where there are 5 members in 
the smaller group and 18 in the larger. The frequency distribution of 
the rank totals expected for the smaller group under the null hypothesis 
is shown in part in Table 1. If the whole of the frequency distribution 
has not been derived it is necessary to calculate the total of the fre- 
quencies by evaluating C''"*. In this case the total of the frequencies 
is C?° or 33,649. If we wished to obtain the 5% critical point we would 
seek that rank total below which 23% of all the rank totals would lie. 
Now 23% of 33,649 is 841.2. From the frequency distribution we note 
that 769 rank totals, that is, 2.29% of the whole, are equal to or less 
than 33; and 932, that is, 2.77% of the whole, are equal to or less than 
34. There is, then, no exact point below which 2.5% of the totals lie; 
but to simplify the tabulation of the critical points it is convenient to 
regard as the 5% critical point the highest rank total below which 
2.5% or less of the rank totals lie; that is, in this case, 33. The 5% 
point so defined will not in general be true to name, since the test will 
be rather more stringent than a P of 0.05 would indicate. However 
the discrepancy is not serious for a procedure of this type. It would 
be possible of course to give the exact P value for each rank sum tabu- 
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lated but this information would not be sufficient compensation for the 
resulting increase in the size of the tables. 

The accompanying tables have been constructed by these methods 


to show the 5%, 1% and 0.1% critical levels for values of n, and nz 
such that n, + n. < 30. 


EXAMPLE 

The method of using the tables may be illustrated by considering 
some data obtained by Wright (4) on the survival time, under anoxic 
conditions, of the peroneal nerve of cats and of rabbits. The survival 
times of the nerves from 4 cats were 25, 45, 33 and 43 minutes; and the 
survival times of the nerves from 14 rabbits were 28, 15, 35, 28, 35, 


5% CRITICAL POINTS OF RANK SUMS 


| | | | | 

| | | 
ns 2} 3] 4] 51 6] 9/10} 11 | 12] 13} 14 | 15 
4 | | 

' | 

5 

6 7|12| 18 | 26 

7 | 20 | 27 | 36 

8 3] 8/14| 21 | 29! 38 | 49 

9 3| 8! 15 | 22} 31 | 51 | 68 

10 3| 91} 15 | 23 | 32] 42153 | 6517 

11 4} 9|16| 24| | 

12 4|10| 17 | 26 | 35 | 46 | 58} 71 | 85 | 99 [115 | 

13 4| 10 | 18 | 27 | 37 | 48 | 6o | 73 | 88 1108 {119 

14 4| 11 | 19 | 28 | 38 | 50 | 63 | 76 | 91 |106 {123 |141 {160 

15 4} 11 | 20 | 29 | 40 | 52 | 65 | 79 | 94 |110 |127 |164 [185 

16 4 | 12 | 21 | 31 | 42 | 54 | 67 | 82 | 97 |114 |131 (150 |169 

17 5 | 12 | 21 | 32 | 43 | 56 | 70 | 84 |100 |117 |135 154 | 

18 5 | 13 | 22 | 33 | 45 | 58 | 72 | 87 |103 {121 139 

19 5 | 13 | 23 | 34 | 46 | 60 | 74 | 90 {107 |124 | 

20 5 | 14| 24 | 35 | 48 | 62 | 77 | 93 |110 

21 6 | 14 | 25 | 37 | 50 | 64 | 79 | 95 

22 6| 15 | 26 | 38 | 51 | 66 | 82 

23 6 | 15 | 27 | 39 | 53 | 68 

24 6 | 16 | 28 | 40 | 55 

25 6 | 16 | 28 | 42 

26 7117 | 29 

27 7117 

28 7 


m, and 7; are the numbers of cases in the two groups. If the groups are unequal 
in size, n, refers to the smaller. 
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23, 22, 22, 17, 20, 30, 30, 16 and 16 minutes. The question is whether 
the survival times tend to be longer in one species than the other. 

First the observations are ranked in order from 1 to 18, rank 1 being 
given to the value 15 minutes. The ranks for the 4 cats are 9, 18, 14 
and 17, and their rank total, 7, is therefore 58. In this case the con- 
jugate rank total, T’, is 4(4 + 14 + 1) — 58, or 18, and is smaller than 
T. The value 18 is then compared with the critical value found in the 
tables for n, = 4 and n, = 14. The 5% critical value is 19 and the 1% 
critical value is 14, so that P lies between 0.01 and 0.05. 


t TIED RANKS 


In the preceding discussion and in the compilation of the tables the 
case of tied ranks has been ignored. These provide no special problem 


1% CRITICAL POINTS OF RANK SUMS 


ne 2} 3! 4] 5] 6] 7] 8] 10] 11] 12] 13] 14] 15 
T 
5 15 
6 10 | 16 | 23 
7 ; 10 | 17 | 24 | 32 
8 11 | 17 | 25 | 34 | 43 
9 6 | 11 | 18 | 26 | 35 | 45 | 56 
10 6 | 12 | 19 | 27 | 37 | 47 | 58 | 71 
ul 6 | 12 | 20 | 28 | 38 | 49 | 61 | 74 | 87 
12 7 | 13 | 21 | 30 | 40 | 51 | 63 | 76 | 90 |106 
li 7 | 14! 22] 31 | 41 | 53 | 65 | 79 | 93 |109 |125 
14 7 | 14 | 22] 32] 43 | 54 | 67 | 81 | 96 |112 |147 
15 8 | 15 | 23 | 33 | 44 | 56 | 70 | 84 | 99 115 |133 |151 
16 8 | 15 | 24 | 34] 46 | 58 | 72 | 86 |102 |119 |137 |155 
17 8 | 16 | 25 | 36 | 47 | 60 | 74 | 89 |122 |140 
18 8 | 16 | 26 | 37 | 49 | 62 | 76 | 92 |108 |125 
19 3| 91/17 | 27] 38 | 50 | 64| 78 | 94 
20 3 18 | 28] 39 | 52 | 66 | 81 | 97 
21 9| 18 | 40] 53 | 68 | 83 
22 3 | 10 | 19 | 29 | 42] 55 | 70 
23 3 | 10] 19 | 30 | 43 | 57 
24 10] 20 | 31 | 44 
25 3] 11 | 20 | 32 
26 3] 11 | 21 
27 4/11 
28 4 
i | 


n, and n; are the numbers of cases in the two groups. If the groups are unequal 
in size, m; refers to the smaller. 
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when the tie occurs in members of the same group. When there is a 
tie between members of different groups, one may average the ranks 
that the tied members would possess if they were distinguishable, and 
then give to each this average value. An alternative method is to rank 
the tied members in such a way us to decrease the departure of the 
rank total from its expected value, and so take a slightly more con- 
servative position in regard to the overthrow of the null hypothesis 
than the chosen significance level would indicate. 


CALCULATION OF APPROXIMATE CRITICAL POINTS 


The quantity 7 may be thought of as the sum of n, items which 
have been obtained by sampling without replacement from the universe 
1, 2,3 --- (a, + Its mean value is therefore x,(m, + + 1)/2 
and its variance n,n2(n, + m2 + 1)/12. The distribution of 7' approaches 


0.1% CRITICAL POINTS OF RANK SUMS 


ne 3} 4; 5] 6} 7] 8] 9} 10] 11; 12] 13] 14] 15 
| | 
7 28 
8 21 | 29 | 38 
9 15 |+22 | 30 | 40 
10 15 | 23 | 31 | 41 | 52 | 63 
ll 16 | 23 | 32 | 42 | 53 | 65 | 78 
12 16 | 24 | 33 | 43 | 55 | 67 | 81 95 
13 10 | 17 | 25 | 34 | 45 | 56 | 69 | 83 | 98 {114 
14 10 | 17 | 26 | 35 | 46 | 58 | 71 | 85 |100 {116 |134 
15 10 | 18 | 26 | 36 | 47 | 60 | 73 | 87 |103 [119 |137 [156 
16 11 | 18 | 27 | 37 | 49 | G1 | 75 | 9O {105 |122 |140 
17 11 | 19 | 28 | 38 | 50 | 63 | 77 | 92 |108 |125 
18 11 | 19 | 29 | 39} 51 | 65 | 79 | 94 
19 12 | 20 | 29 | 41 | 53 | 66 | 81 | 97 
20 12 | 20 | 30 | 42 | 54 | 68 | 83 
21 | 12 21 | 31 43 | 56 | 70 
22 G | 13 | 21 | 32 | 44 | 57 | 
23 6 | 13 | 22 | 33 | 45 
24 6 | 13 | 23 | 34 
25 6 | 14 | 23 | 
26 6 | 14 | 
27 7 
| 


n, and nz are the numbers of cases in the two groups. If the groups are unequal 
in size, n, refers to the smaller. 
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the normal as m, and n, are increased and therefore approximate critical 
points for 7 may be obtained from the expression 


1/2 
(n; 1) +m + (2) 


12 
where ¢ = 1.96 when a 5% level is required 


2.58 when a 1% level is required 


This approximation is excellent for the 5% level. For example, 
when », + 2 = 30 the approximate values derived by this formula 
agree with the tabulated values in 11 out of 14 cases and are one unit 
higher in the remaining 3 cases. The formula also gives good approxi- 
mations for the 1% critical points unless n, is small, but it cannot be 
safely used to give approximations for 0.1% points. 

By use of the formula (2) one may obtain 5% critical points addi- 
tional to those tabulated. A simpler method of obtaining critical values 
when 2, is larger than is provided for in the tables is to note that in any 
column of the tables the successive rows increase by approximately 
equal amounts. Thus in the table for the 5% critical points, when 
n, = 4 the increase in successive rows is about 0.86 on the average, and 
an approximate value for x, = 4 and n. = 34 would therefore be 36. 
This method may be used for all three tables. 


DISCUSSION 


The use of a ranking test of the type described does not involve any 
assumption that the distribution from which the samples are drawn at 
random is normal; and this may be an important advantage. Biologists 
at times measure bizarre variates, such that one can have little intuitive 
feeling as to what form the frequency distribution of the measurements 
in the population is likely to take; and any attempt to investigate the 
distribution may involve much more work than the experiment in hand. 

As against this, one must place the disadvantage of sacrificing some 
information by the process of ranking, and the disadvantage of being 
unable to set confidence limits to the estimate of the difference between 
the two groups. The first of these criticisms has been made by David 
(1) in the course of a review of Kendall’s book on rank correlation: 
“It is interesting to note in the univariate case . . . that while many 
order statistics have been proposed (all of which are easy to apply and 
interesting mathematically) yet it is rare indeed to find the need to use 
them in practice. It is customary to twist the observations about 
and/or to make various assumptions in order that existing techniques 
may be applied. This, the writer would suggest, is because of the 
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instinctive feeling that tests based on ranks cannot be very discrimi- 
nating’. 

This criticism must be kept in mind; but it is pertinent to note that 
it contains a rather unflattering description of the way in which some 
alternative techniques are used in certain difficult cases; and further. 
that the criticism loses its foree whenever the discrimination achieved 
is adequate for the purpose of the experimenter. 

‘The disadvantage of being unable to use the ranking procedure to 
make interval estimates of the difference between the two groups is a 
real one in many branches of biology, especially in applied biology; but 
] do not think this is true, by and large, in physiology. For instance, 
in the example quoted on the survival time of the peroneal nerve, the 
physiologist would not, in general, be interested in estimating the differ- 
ence between the two species, even though he wished to know that 
the species did, in fact, differ. There are many cases in the physiological 
literature where a similar situation exists: the experimenter is satisfied 
if the statistical methods merely provide. help in testing his hypothesis. 
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VARIANCES OF DIFFERENCES BETWEEN MEANS WHEN 
THERE ARE TWO MISSING VALUES IN RANDOMIZED 
BLOCK DESIGNS 


Wituiam BaTEN 
Michigan State College 


ET us consider a randomized block design with s “treatments” and 
r blocks with two missing plot values in different blocks and in 
different “treatments”. In Fig. 1 the data are arranged in a two-way 


“Treatments” 
1 
Bh 
4 
k w B+w 
Tora. Tj; +w G+zt+w 


FIG. 1. RANDOMIZED BLOCK LAYOUT 


table for computing purposes. Let z and w represent estimates of the 
missing plot values in the hth block and in the :th treatment and in the 
kth block and the jth treatment. Let y;,, represent the known value 
in the Ith block and the mth treatment (1 # h when m = i andl ¥ k 
when m = j). Let B, and B, represent respectively the totals of the 
known values in the Ath and kth blocks and let 7; and 7; represent 
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respectively the totals of the known values in the 7th and jth treat- 
ments. 

From the fundamental identity for this design the error sum of 
squares may be written as 


(1) Error SS = Total SS — Block SS — Treatment SS. 


By minimizing the error sum of squares, two simultaneous linear equa- 
tions in z amd w arise, the solution of which gives the following estimates 
of the missing values (1) 


(r — 1)(s — 1)[s7; + — G] — [sT, + rB, G] 


(2) 


— — 1)? — 1] 
Let us find a variance of the difference between a treatment mean 


that does not contain an estimate of a missing value and one that does: 
say the variance of the difference between the first and the ith means. 


“Treatments” 


1 
= 
b w D+wt+ft+b 
Torats t+2 t; +a +t; 
+f +b + w +a+b 
c+f+ 


FIG. 2. RANDOMIZED BLOCK LAYOUT 


In Fig. 2 let c and f represent the known values in the first treat- 
ment and in the Ath and kth blocks respectively. Let a represent the 
known value in the Ath block and the jth treatment; let b represent the 
known value in the (k, 7)-cell. Let B represent the total of the known 
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values in the hth block except a and c; let D represent the total of the 
known values in the kth block except f and b. Let ¢, represent total 
of the known values in the first treatment except c and f; let ¢; and ¢; 
represent the totals of the known values in the ith and jth treatments 
except b and a respectively. Let P represent the total of the known 
values in the design except a, b, c, f, B, D, t, , t; , t; . The quantity P 
contains (r — 2)(s — 3) known values or plots. 

The difference between the first an ith means, ifv = (r — 1)(s — 1), 
is equal to 


v[st; + sb + + 1a + rc — G] 
v—1 


tb +f — ah /, 


v—1 
= {@ — 2)t, — + — + — DIE; 
| + fv? — (r — — 
+(’ +v4+r— 2)f — [rr — lv (8 Da 
— + (8 — — 
— — lo t+ 1B + — — 1) 


Ishork 


+ [v + (s — 1)] + — — — 


+ 
= — lv — (8 — — — — 
1] LD yo 


mel,iori 


The values, y;,, , are independent and the values of the y’s in any 
of the above terms are independent of the y’s in any of the other terms. 
The variance of each of the y’s is assumed to be equal to o”, of which 
the error mean square in the analysis of variance table is an approximate 
value. This error mean square is computed from the known values and 
the estimates of the missing plots. The variance of each of a, b, c and 
f is also o’. The variance of the above difference is 
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Variance of d = +» — — 

+ [v? + (s — — 

+ fo + — IPO — + — — — 

+f? +ot+r — 

+ [(r — Dv — — o? + + o(s — 1) — 

+ — + — 8)o” 

+ fy + (r — — + & — 1)*%(r — 2)(8 — 

After « considerable amount of simplification this reduces to 


Var. d = [(2r + + (r +8 — Do — — 1) 


or 


git s(r — 1)(s — 


The standard deviation of d is 


2 s(r — 1)(s — 1) 2 sv 
where r and s are greater than 1. 

In a similar way the standard deviation of the difference between 
the means of the treatments containing estimates of the missing values, 
or the standard deviation of the difference between the 7th and the jth 
means if (¢; + b + x — 1; — a — w)/r = H, can be shown to be 


2s 2s 


When deriving this formula it is not necessary to employ c and f. In 
this case P is made up of (r — 2)(s — 2) plots. 

When the missing values are in the same block and in the ith and 
jth treatments, the standard deviation of the difference between any 
treatment mean containing an estimate of a missing value and one not. 
containing a missing value can be shown to be equal to 


2 (s — 1) 
5) + rr — I)(s — 2) 


where s > 2andr > 1. 
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The standard deviation of the difference between the means con- 
taining estimates of the missing values is 


2 
(6) + r(r — 1? 


When the missing values are in the same treatment the standard 
deviation of the difference between any treatment mean which does not 
contain estimates of the missing values and the treatment mean which 
contains them is 


2 2s 
@) 2 + r(r — 2)(s — 1) id 


where r > 2and s > 1. 

It has been advocated that it is not necessary to find formulas for 
missing values when there is more than one missing value, because the 
formula for estimating one missing value may be repeated several times. 
By this method good estimates for more than one missing value may 
be obtained. Let us examine the case when there are two missing values 
in a randomized block design with r blocks and s treatments and that 
the missing values are in the (h, 1)-cell and in the (k, j)-cell. lat x 
and w represent respectively these estimates. 


Let us approximate these values by successive application of the 
formula for an estimate of one missing value. The formula is 


(8) +1Q, — G)/(r — I — 1) = CT: +1Q, — G@)/, 
(G’ = sum of known values where there is only one missing value) 


provided the missing value is in the Ath block and ith treatment and 
T; is the sum of the known plots in the ith treatment. Q, is the sum of 
the known plots values in the Ath block and G is the sum of the known 
plot values in the layout (2). 

Let the first estimate x be z, = T;,/(r — 1), or the average of the 
known values in the ith treatment. The layout now has one missing 
value, w. Let us now estimate w by w, by using formula (8); this gives 


w, = [sT; +7Q, — G — — = — — — 


where M = s-T; + rQ, — G, (where G + T;/(r — 1) = G’) 

let us use w, to find a second estimate of z. The first estimate, z, , of 
z is discarded now. The layout has now only one missing value, namely 
z, since w, is used as an estimate of w. According to (8) this estimate 
of z is 
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{sT; + 71Q, — G — [7 — IM — — Iv} 
— — NM 4+ — Iv’ 
= — + [T./@ — 
where A = s7T; + rQ, — G. 
By discarding w, and by using z, and (8) 


w, = — — [T./@ — + [MA’] 
The values 


= [vA — M)/’] + [T./(r — 


(9) 
w, = [(~M — A)/o*] + [M/o*] — — 
obtained by employing formula (8) three times are not equal to those 


values obtained by the least squares method employed at the beginning 
of this article in formula (2) 


z = (vA — M)/(v’ — 1) 
(10) 
w = (vM — A)/(v’ — 1). 
Consider the following randomized block layout where the data have 


been arranged for computing purposes, and where x and w represent 
estimates of missing plots. According to formulas (9) and (10) these 


TABLE 1. DATA ON YIELD OF U.S. NO. 1 AND NO. 2 COMBINED FOR 1937 TOMATO 
FERTILIZER PLOTS (1). 


Treatments 


ca- 1 2 3 4 5 6 7 8 9 10 | 11 | 12 


1 177 | 161 | 254 | 217 | 145 | 214 | 155 | 176 | 133 | 262 | 190 | 113 
2 99 | 199 | (173) | 189 | 139 | 146 | 148 | 192 94 | 124 | 224 | 223 


3 78 | 1383 | 102 | 132 | 242 | 176 | 172 | (175) | 118 | 182 | 181 | 193 
4 | 164 | 203 | 195 | 214 | 107 | 210 | 224 | 212 | 116 | 187 | 179 | 149 


estimates are identical to 3 decimal places and are equal to 
x = 172.783 or 173 
w = 175.158 ~— or 175 
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The standard deviation of the difference between the means con- 
taining estimates of the missing values is 


2 2 
6) 


When the missing values are in the same treatment the standard 
deviation of the difference between any treatment mean which does not 
contain estimates of the missing values and the treatment mean which 
contains them is 


2 2s 
” rr — 1)” 


where r > 2ands > 1. 

It has been advocated that it is not necessary to find formulas for 
missing values when there is more than one missing value, because the 
formula for estimating one missing value may be repeated several times. 
By this method good estimates for more than one missing value may 
be obtained. Let us examine the case when there are two missing values 
in a randomized block design with r blocks and s treatments and that 
the missing values are in the (h, z)-cell and in the (k, j)-cell. Let x 
and w represent respectively these estimates. 


Let us approximate these values by successive application of the 
formula for an estimate of one missing value. The formula is 


(8) (sT; + 1rQ, — G’)/(r — 1)(s — 1) = GT; + 1Q, — G)/, 
(G’ = sum of known values where there is only one missing value) 


provided the missing value is in the hth block and ith treatment and 
T ; is the sum of the known plots in the ith treatment. Q, is the sum of 
the known plots values in the hth block and G is the sum of the known 
plot values in the layout (2). 

Let the first estimate x be xz, = 7;/(r — 1), or the average of the 
known values in the ith treatment. The layout now has one missing 
value, w. Let us now estimate w by w, by using formula (8); this gives 


w, = [s7; +7Q, — G = — T,)/(r — 


where M = s-T; + rQ, — G, (where G + T;/(r — 1) = G’) 

Let us use w, to find a second estimate of z. The first estimate, z, , of 
zx is discarded now. The layout has now only one missing value, namely 
z, since w, is used as an estimate of w. According to (8) this estimate 
of x is 
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{sT; + 71Q, — G — — — — Iv} 
[(r — — — — Iv’ 
[(wA — M)/*] + [T./@ — Iv’), 


where A = sT,; + 1Q, — G. 
By discarding w, and by using z, and (8) 


w, = — — [T./@ — + [MA*] 
The values 


= [vA — + — 


(9) 
w, = — A)/ro*] + [M/o*] — [T./( — 
obtained by employing formula (8) three times are not equal to those 


values obtained by the least squares method employed at the beginning 
of this article in formula (2) 


z = (vA — M)/(v’ — 1) 
(10) 
w = (vM — A)/(v’ — 1). 
Consider the following randomized block layout where the data have 


been arranged for computing purposes, and where x and w represent 
estimates of missing plots. According to formulas (9) and (10) these 


TABLE 1. DATA ON YIELD OF U.S. NO. 1 AND NO. 2 COMBINED FOR 1937 TOMATO 
FERTILIZER PLOTS (1). 


‘Treatments 


1 177 | 161 | 254 | 217 | 145 | 214 | 155 | 176 | 133 | 262 | 190 | 113 
2 99 | 199 | (173) | 189 | 139 | 146 | 148 | 192 94 | 124 | 224 | 223 


w 
242 | 176 | 172 | (175) | 118 | 182 | 181 | 193 
164 | 203 | 195 | 214 | 107 | 210 | 224 | 212 | 116 | 187 | 179 | 149 


33 
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estimates are identical to 3 decimal places and are equal to 
x = 172.783 or 173 
w = 175.158 ~— or 175 
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The variance of the difference of the means of treatments 3 and 8, 
according to (4), is 


2(12) | _ 2 
2012), = 0.68750 


where o” is the error mean square in the analysis of the variance table 
for these data. 

The variance of this difference according to Yates’ approximate 
method (2), given by Love (3) is obtained by the following scheme, 
which is employed to determine the denominators in the standard 
deviations of the means. 

A value of unity is assigned to a plot in a block that does not contain 
a missing value. A value of 3 is assigned to a plot if it does not contain 
@ missing value, and there is a missing value in the plot of the other 
treatment in this block. A value of zero is assigned to a plot that con- 
tains a missing value. Table 2 shows the replications that are to be 
used for determining the variance of the difference of the means of 


TABLE 2—A SCHEME FOR FINDING THE DENOMINATORS OF THE STANDARD 
DEVIATIONS OF THE MEANS. 


Replications for 
Treat. 3 Treat. 8 
Block 1 1 1 
Block 2 0 4 
Block 3 3 0 
Block 4 1 1 
Total 2.5 2.5 


The variance of the difference between these means is 


which is larger than that obtained by the least square method. In 
fact the latter variance is 16% larger than the former. 

The variance of the above difference, according to Yates’ approxi- 
mate method is always larger than that obtained by the least square 
method, when 


s > 3, 
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where s represents the numberof treatments and r represents the 
number of blocks. 

According to the approximate method, the variance of the difference 
of a mean that does not contain a missing value and a mean that does 
contain a missing value, is always greater than that obtained by the 
least square method when 


(r — 1)(4r7 — 3) + V(r tir 
s> 
— 1) 


when the missing values are in different blocks and in different treat- 
ments. The radical in the above is always positive when r > 1, or 
when there are at least 2 replications. s is positive when r > 2. 

When the missing values are in the same block, the variance of the 
difference between a mean that’ does not contain a missing value and a 
mean that does, according to Yates’ approximate method, is always 
greater than that obtained by the least square method, when 


4r — 3 
r— 1 


> 

The variance of the difference of means, each containing a missing 
value, by Yates’ approximate method, is always equal to that obtained 
by the least square method, when s > landr > 1 and when the missing 
plots are in the same block. 

When the missing plots are in the same treatment, the variance of 
the difference of a mean that does not contain a missing value and the 
mean containing both of the missing values, according to Yates’ ap- 
proximate method, is always greater than that obtained by the least 
square method, when 


Consider that there are 2 missing plots in the second and third blocks 
of treatment 3 in Table 1 and that the value of 175 has been accepted 
as the value in the plot in the third block of treatment 8. There are 
now 2 missing plots in treatment 3. According to the least square 
method, estimates of these are equal respectively to 


x 


208.409, (x = estimate of the missing plot in the 2nd block) 


w 


208.864 (w = estimate of the missing plot in the 3rd block) 


According to Yates’ approximate method 


2 
| 


50 BIOMETRICS, MARCH 1952 


Z, = 224.5 (ave. of plots in this treatment), w, = 214.227 
Z, = 210.197, 2; = 208.608, 2, = 208.431, 
w. = 209.460, ws = 208.930, w, = 208.871. 


In this case it was necessary to apply the iterated method 7 times 
before the results from the two methods were the same to whole numbers. 
Consider in Table 1 that the missing values are in the second block 
and in the 3rd and 8th treatments and that 175 is the value for the 3rd 
block and 8th treatments. 
According to the least square method, estimates of these missing 
plot-values are 


x = 171.267 (x = estimate of the plot in the 3rd treatment) 
w = 175.267. 
According to Yates’ approximate method 
2, = 183.667 (Ave. of known values) w, = 176.394, 
= 171.369, =z, = 171.267, 
w. = 175.276, ws = 175.267. 


The two methods lead to values that are about the same when the 
formula for one missing value was used repeatedly. It is not necéssary 
to go to z; and w; , as 2, and w, are near enough to z and w, as found 
by the least square method. 

When using the formula for one missing value, one never knows how 
many times it should be repeated to give a value that is approximately 
equal to that obtained by the least square method. In one of the 
above cases it was necessary to calculate z, and w, before the two 
methods led to approximately the same estimates. Since the variance 
of the differences of any two means, when there are two missing plot 
values in a randomized block layout are known, it appears to me to be 
wisdom to use the formulas developed in this paper. 
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ON THE CONSTRUCTION OF TABLES FOR MOVING- 
AVERAGE INTERPOLATION 


R. THomMpson AND Carrot S. WEIL 


The Division of Laboratories and Research, New York State Department of Health, 
Albany, and the Mellon Institute of Industrial Research, Pitisburgh, Pa. 


oe use of the method [1] of moving-average interpolation to 
estimate median-effective dose M led one of us (C.S.W.) to under- 
take construction of tables to give the estimate m of M and the estimated 
standard deviation of its logarithm for the case n; = n, a constant, 
using the original formulas (6) and (A9). The present purpose is to 
indicate some simple relations that afford great economies both in con- 
struction and in presentation of such tables. Original notations [1] will 
be used with D; = dosage given n,; subjects, r; = the number of these 
that respond critically, s; = n, — r; , d = log (D;/D,-_,) which is con- 
stant, and K = the chosen moving-average span. The interpolation is 
based upon results at K + 1 successive dosage levels (D,, Da.:,-°+- , Ds) 
where b = a+ K. For the given case (n; = 7) to facilitate discussion 
define an f-function and a g-function of the corresponding values of 


r; a8 follows: 
nk 


(1) = 
and 


— 
Then relation (6) of the cited article [1: (6)] gives, for0 < f < 1, 
(3) log m = log D, + d-[f + (K — 1)/2], whence 
(4) log m = log D, + d-[a —c + f + (K — 1)/2] 


where c is any convenient value of the index 7. From [1: (A5 to A9)] we 
obtain the variance estimate by use of the relation 
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1 


However, as emphasized elsewhere [1,2], we should guard against too 
great reliance upon estimates of variance based upon results of a single 
assay. 

For the sake of economy in forming a table, consider the effects of 
certain types of changes in the arguments of the functions, f and g, 
defined in (1) and (2). Accordingly, let: 


A be any rearrdngement of r,,, to 7,_, , inclusive; 

B be an interchange of r, and r, ; 

C be a replacement of r; by s; throughout; and 

D be any change in the values of r,,, to 7,_, , inclusive, provided 
that their sum is unchanged. 


Obviously, A leaves both f and g invariant; B changes f from f, to 
fe = 1 — f, but leaves g the same; C leaves both f and g invariant; 
and D does not change f. The effect of D upon g may be seen by de- 
velopment of relations given below. 

Let r{ denote the resultant values after a change of type D operating 
on the original argument values r; (where i = a, --- , 6); of course 
with r, = r, and ri = r,. Furthermore, let 

b-1 
(6) —r;, whence >> [w,] = 0 identically, and w, = w, = 0. 
i=at+l 
Now, let g, and g, respectively denote the value of g before and after 
the type D change; and let 


(7) A=(n- a) — gr). 


Obviously, since D does not change f, we have from (2) and (7), the 
following relations for }> summations taken over the index values from 
t= a+1toz = b — 1 in each instance: 


DO? 

=0-—2- Drow, 


Now, in constructing parts of a table, it appears convenient wherever 
possible at any one stage to proceed by a change in only two of the 


(8) 
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TABLE 1 
“ALUES OF f AND ESTIMATED STANDARD DEVIATION FOR THE CASE, x = 4 AND 
* = 3, FOR CALCULATION OF THE ESTIMATE m OF MEDIAN-EFFECTIVE DOSE AND 
é.N ESTIMATE OF THE STANDARD DEVIATION OF log m IN USE OF MOVING-AVERAGE 


INTERPOLATION. 

r-values f r-values f os? of 
0024 1.00000 0244 0.00000 1/12 0.28868 
0334 1/8 0.35355 
0034 0.75000 0144 0.25000 1/16 0.25000 
0124 0234 7/48 0.38188 
0044 0.50000 ——- (0) (0) 
0134 ——. 1/8 0.35355 
0224 ” 1/6 0.40825 
1024 1.00000 0243 0.00000 4/27 0.38490 
1034 0.66667 0143 0.33333 10/81 0.35136 
1124 0233 22/81 0.52116 
1044 0.33333 0043 0.66667 4/81 0.22222 
1134 0133 22/81 0.52116 
1224 0223 28/81 0.58794 
1144 0.00000 0033 1.00000 2/9 0.47140 
1234 0123 10/27 0.60858 
2024 1.00000 0242 0.00000 1/3 0.57735 
2114 0332 1/2 0.70711 
2034 0.50000 0142 0.50000 1/3 0.57735 
2124 sa 0232 3 2/3 0.81650 
2044 0.00000 0042 1.00000 1/3 0.57735 
2134 0132 5/6 0.91287 
2224 sd 0222 ” 1 1.00000 
3024 1.00000 0241 0.00000 4/3 1.15470 
3114 af 0331 dy 2 1.41421 
3034 0.00000 0141 1.00000 2 1.41421 
3124 a 0231 10/3 1.82574 
1033 1.00000 1143 0.00000 1/2 0.70711 
1123 1233 5/6 0.91287 
1043 0.50000 1/8 0.35355 
1133 ” 5/8 0.79057 
1333 2 19/24 0.88976 
2033 1.00000 $1492 0.00000 2 1.41421 
2123 1232 10/3 1.82574 
2043 0.00000 1042 1.00000 4/3 1.15470 
2133 1132 10/3 1.82574 
2223 1222 ss 4 2.00000 


Note: The r-values are rg to rp , inclusive () = a-+ K). Let A be any rearrangement of 741 to 
r>_1, inclusive, B be interchange of rg and rp , and C be replacement of r; throughout by 3; = n — 7; ; 
and let f be f; before and fo afterwards. B makes fe = 1— f; ; but fis not changed by A or C. Neither 
A, nor B, nor C changeagy. The combined operation, A‘ B- C, is used in each row to obtain the second 
set of r-values from the first. Glogm <= doy. 
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r-values in type D changes; in such a case let r, and r, be the two values 
affected, by raising and lowering, respectively, by an amount h; i.e., 


(9) ® = —w =h, an integer (usually unity), and 


— 1, — 
(n a To)” 


Ordinarily, a sequence of D-changes will be used with h = 1; often 
without change in the denominator of the fraction in relation (10); 
which, in any case, had to be found in computation of the initial value 
of g in the sequence. The increment in g is then found successively 
by dividing A = 2(r, — rv, — 1) by the common value of 
(n — 1)(r, — r.)*. Tables should be thoroughly checked, of course; one 
such check is provided by direct computation of the increment in g 
from the first to the last value in such a sequence by use of (10) with h 
taken equal to the number of successive stages of unit changes in the 
r-values. An example is given in construction of the complete table for 
K =3,n = 4. 

Noie: The original article [1] compared several methods of estimating the median-effective dose 
raised objection to some, and suggested possible modifications, one of which was the simple moving- 


average interpolation method. Recently, Armitage and Allen [3], at the London School of Hygiene 
and Tropical Medicine, have made practical comparisons of these methods and such modifications. 


(10) A= 2hk(, — A); and 
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ERRORS AND VARIATIONS IN WHITE-CELL COUNTS 


A. C. CHAMBERLAIN AND F. M. Turner 
Atomic Energy Research Establishment, Harwell 


I 1907 Student used the distribution of spcres between the squares 
of a counting chamber to illustrate the Poisson distribution, and 
further studies of the distribution of white cells and the errors of counting 
techniques have been made by Bryan, Chastain and Garrey (1935), and 
Berkson, Magath and Hurn (1940). 

Our work formed part of a statistical evaluation of white cell counts 
as a means of detecting the effects of radiation exposure. 

It was desirable to assess the error of the count in the course of 
routine work, since the determinations of Bryan, Chastain and Garrey, 
and Berkson, Magath and Hurn were based on repeated counts of a 
number of special samples of blood. It was also required to extend the 
estimate of error to the differential count. ‘The neutrophil and lympho- 
cyte counts which are deduced from it, are in some ways better indices 
of radiation effect than the total white cell «ount. 


Technique of performing white-cell count 


The white-cell count is done in two stages, estimation of the total 
number of white cells or leucocytes per unit volume of blood, and esti- 
mation of the proportions of the different white cells in the total. 

Capillary blood is drawn from the finger of the patient, and the first 
few drops are discarded. Blood is then drawn into a pipette until a 
mark on the stem is reached, followed by diluting fluid until a second 
mark is reached, corresponding to a 20:1 dilution. The pipette is then 
shaken in a mechanical shaker. 

The hemacytometers or counting chambers used have two chambers 
1/10 mm. deep separated by a central trough and covered with a cover 
slip. Blood from the pipette is placed in one chamber of the hema- 
cytometer by placing a drop under the side of the cover slip remote 
from the central trough. The blood flows by capillary attraction into 
the chamber as far as the central trough, each one square millimeter 
area thus receiving 1/200 mm‘* of undiluted blood. 
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In routine practice at A..2.1e. blood from another pipette, filled 
from the original finger puncture, is placed in the other chamber of the 
hemacytometer, and four one square millimeter areas are counted in 
each chamber. 

For the differential count a drop of blood is spread on a plain glass 
slide, fixed and stained and examined by traversing the slide with a 
microscope. The leucocytes found are identified as lymphocytes, nevtro- 
phils, etc., and a count made of the number of each type in the first 
100 or 200 leucocytes encountered. 


_ Error of white-cell counting 


Sources of error in routine practice are :— 


(i) Field error, due to statistical variation in the number of cells 
appearing in the squares counted. 
(ii) Hemacytometer error, due principally to inaccuracies in the 
depth of the chambers and the use of non-plane cover slips. 
(iii) Pipette error, with three components 

(a) Inaccurate calibration of pipettes, 

(b) Errors in filling the pipettes to the marks, 

(c) Variations between the drops of blood used to fill the 
pipettes, which are inseparable from the true pipette errors 
in routine practice. 

(iv) Technician error, personal bias in manipulation or counting, e.g., 
in assessing whether a cell lies within the ruled lines. 


The estimation of errors as they arise in routine practice is im- 
portant. In special laboratory studies, errors of calibration can be 
avoided or allowed for, and the source of error (iii) (¢) can be avoided 
by first obtaining and mixing a sample of blood sufficient to fill several 
pipettes, but these refinements cannot be applied when maximum output 
at minimum cost is needed. 

To assess the principal components of error arising in practice, the 
routine procedure was modified so that two pipettes and two hema- 
cytometers were used for each count. Blood from one pipette was 
placed in one chamber of each hemacytometer, and blood from the other 
pipette in the other chambers, as shown in Figure 1, and counts made 
in the four corner squares of each caamber, sixteen squares in all. This 
modified routine was used on 100 patients in the normal sequence. All 
the technicians employed at the iime took part, and pipettes and 
hemacytometers were drawn at random from a stock in common use, 
comprising about 30 of each. 

At first it appeared that the within-chamber variance in the number 
of cells per square was 1.37 times the mean number per square, contrary 
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FIG.1. ARRANGEMENT OF PIPETTES ANDO HEMACYTOMETERS. 
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to the finding of Berkson, Magath and Hurn (1940) and others who have 
reported almost perfect Poisson distribution. The explanation was 
found in the finding of Hynes (1947) that the leucocytes tend to migrate 
to the side of the chamber remote from that at which the blood is intro- 
duced, so that the counts on the squares nearer to the central trough 
tend to exceed those on the more remote squares. 

The means were 


Squares nearer to trough 36.350 + 0.008 
(1, 2, 8 and 7 in Figure 1) 
Squares further from trough 31.815 + 0.007 


(3, 4, 5 and 6 in Figure 1) 


When the variation with trough distance was separated out in the 
analysis of variance in the usual way, the residual variance between 
squares was found to be very close to the mean. 

As an alternative, the counts in pairs of squares such as 4 and 1 
in Figure 1 were added together, thus eliminating trough distance from 
the analysis. This had the further advantage that the count in a pair 
of squares* was never less than 25, even when a leucopenia was en- 
countered, so that the Poisson distribution approximated very closely 
to the normal. An analysis of variance into components between 
pipettes, between hemacytometers, interaction and residual was done 
for each of the 100 counts, and the results added together to give the 
analysis of Table I. 


*Whenever this term is used it will be in the sense of a pair such as 4 and 1 in fig. 1. 
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TABLE I 
ANALYSIS OF VARIANCE OF LEUCOCYTE COUNTS 
Sum of Degrees of Mean 
squares freedom square F 

Between Pipettes 17122.75 100 171.2275 2.5998 
Between Hemacytometers | 9384.375 100 93 . 8437 1.4249 
Interaction 6792 .875 100 - 67.9287 1.0314 
Residual (field error) 16344.00 400 65.8600 1.0000 
Mean count in pair of squares 68.165 1.0350 


The between pipette and between hemacytometer variances are 
highly significant relative to the residual. The interaction between 
pipettes and hemacytometers is not significantly greater than the resi- 
dual. This implies that such variation as occurs in the hemacytometers 
is common to both the chambers of the instrument, since, if it were 
not, there would be a source of variance, additional to the residual or 
field error and to the variations between pipettes and between hema- 
cytometers. This conclusion was borne out by a physical examination 
of some of the hemacytometers, which revealed that the depth of the 
chambers varied from the correct value about 10 times as widely as the 
depth of one chamber of a hemacytometer varied from the other. 

Since the mean square due to residual or field error is so close to 
the mean, the latter will be taken as the best estimate, in accordance 
with the theoretical Poisson distribution. 

The estimates of error due to the first three causes mentioned above 
can now be made. If o% , 0; and co; are the variances in the count of 
cells on a pair of squares due respectively to the field error, hemacyto- 
meters and pipettes, our estimates of these quantities are given by 


o: = M = 68.165 (1) 
+ 403 = 93.8437 (2) 
+ 402 = 171.2275 (3) 
which lead to 
o, = VM = 8.256, 0, = 2.584, a, = 5.055 (4) 


Expressed as coefficients of variation the hemacytometer and pipette 
errors are 
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= = 3.72% (5) 
= 7.43% (6) 


mor 


These estimates compare with values of 4.7% and 4.6% respectively 
found by Berkson, Magath and Hurn (1940). These authors estimated 
the errors by repeated red-cel] counts with a limited number of pipettes 
and hemacytometers, and concluded that the same errors applied to 
white-cell counts since the total variation computed therefrom agreed 
with that observed experimentally. 

The variance of a count made by using n, hemacytometers and n, 
pipettes and counting 7, pairs of squares, or 2n, squares in all is given 
by 

2 


2 2 
2 on Oy 
7 
t (7) 


and the coefficient of variation by 
100°M , 100°, , 100°s; 


= + Mn, MPa, 
(8) 
100 3.7 7.43 
tn ny 
or if n, is the total number of cells counted, so that 
nm, = Mn, 
(9) 


Only the first term in this expression, that corresponding to the 
field error, depends on the number of cells observed. The contributions 
of the hemacytometer and pipette error are independent of the absolute 
value of the count. That this is not merely a consequence of the form 
of analysis was checked by estimating the values of V, and V, inde- 
pendently from the 50 lowest and 50 highest counts. The values ob- 
tained were not significantly different, in accordance with the findings 
of Berkson, Magath and Hurn, but contrary to those of Bryan, Chastain 
and Garrey, who found a constant standard deviation, rather than a 
constant coefficient of variation, in their counts. 

In routine practice, using two pipettes and one hemacytometer and 
counting eight squares in all we have 


m = 1, 


. 
q 
i 
4 
5. 
Nn, = 2, n, =4 
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and for a count of average level, 
M = 68, corresponding to 6800 leucocytes/mm* 
equation (9) gives 
V, = 88% 


The values of the three components of equation (9) are shown in 
the first three sections of Figure 2, for various values of n, , n, and n, . 


FIG.2. COMPONENTS OF TECHNICAL ERROR OF WHITE-CELL COUNTING. 
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Technician Error 


In the course of an experiment in which hourly counts were done 
on a number of subjects, pairs of technicians performed series of simul- 
taneous counts, each using the routine procedure with one hemacyto- 
meter and two pipettes. Each pair of counts thus constituted a pair 
of independent estimates of the leucocyte level at a given time. 

The pipettes used in this work were specially chosen and known to 
be accurately calibrated, and greater care than usual was taken, so that 
the results are not strictly comparable with those of the earlier ex- 
periment. 

The average coefficient of variation between the pairs of simultaneous 
counts by different technicians was 7.8%, compared with 8.8% expected 
from equation (9). 

Only one technician was found to have a significant systematic bias. 


Variations in differential counts 


The only quantitative process in the differential count is the enumera- 
tion of the different cells, and the likely sources of error are:— 
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(1) statistical variation in the numbers of lymphocytes, neutrophils, 

and other cells occurring in the leucocytes examined, 

(2) variations in the proportions of cells in different drops of blood 

used to make slides. 

To estimate the error a series of 147 counts were done at various 
times and on various subjects. Two slides were prepared from con- 
secutive drops of blood from a finger puncture for each count, and two 
sets of 100 leucocytes were enumerated on each slide. For simplicity 
only the two most common types of leucocyte, namely neutrophils and 
lymphocytes, were considered separately, the remainder, including the 


less common monocytes, eosinophils and basophils being added together 
as ‘others’. 


A typical count was as follows:— 


Neutrophils | Lymphocytes} Others Total 

1st Slide 

Ist Set 57 21 22 100 

2nd Set 52 28 20 100 
Totals for 1st slide 109 49 42 200 
2nd Slide 

Ist Set 66 19 15 100 

2nd Set 56 26 18 100 
Totals for 2nd slide 122 45 33 200 

Granp ToTALs 231 94 75 400 


The results for each slide can be considered as a 3 X 2 bivariate 
table and a x’ test applied in the usual way to test the association 
between ‘cells’ and ‘sets’ (Fisher 1948, Chap. 4). 

Thus the first slide gives the sum of squares 


tw (57 — 54.5)? + (52 — 54.5)’ ‘ (21 — 24.5)? + (28 — 24.5)’ 
ba 54.5 24.5 


(22 — 21)* + (20 — 2)? 
21.0 


= 1.324 
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and similarly the second slide gives 


S= (66 — 61) + (56 — 61)’ 4 (19 — 22.5)? + (26 — 22.5)? 
= 61 22.5 
4 (15 — 16.5)? + (18 — 16.5)? 
16.5 


= 2.182 


If the variation between sets is due solely to random variations (i.e. 
according to the trinomial distribution) then S should be distributed 
as x’ with two degrees of freedom. 

The experimental distribution of the 294 values of S within slides 
is compared with the theoretical distribution in the first two columns of 
Table II, and agreement is good. The agreement between theoretical 
and experimental distributions can be tested by a further x’ test, and 
gives P = 0.2. 

In the same way the distribution between slides can be tested, using 
the column totals for each slide, giving in the example 


S= (109 — 115.5)? + (122 — 115.5)? i (49 — 47)? + (45 — 47)? 
"a 115.5 47 
4 (42 — 37.5)? + (33 — 37.5)’ 
37.5 


= 1.982 


If the variation is solely due to the random fluctuations in numbers 
of cells enumerated S should again be distributed as x’ with 2 degrees 
of freedom, but if there is in addition a ‘between slides’ variance the 
distribution should be shifted towards the higher values. 

The comparison between the 147 experimental values of S and the 
distribution of x’ with two degrees of freedom is given in the last two 


' columns of Table II. The agreement is not quite so good as for the 


within slides analysis (P = 0.1), but the mean value of S is almost 
exactly the same, and there is therefore no reason to think that a ‘be- 
tween slides’ error exists. 


Error of differential count 


Having established that the random variation in number of cells 
encountered is the only significant source of variation in the differential 
count, the error can be written down at once. 

If p is the proportion of an individual white cell, neutrophils for 
example, in a population of leucocytes, and p is the value of p estimated 
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TABLE II 
DISTRIBUTION OF S WITHIN SLIDES AND BETWEEN SLIDES 
Within slides Between slides 
Interval 
Observed Expected Observed Expected 

No. No. No No. 
0-0.0201 0 2.9 i 1.5 
0 .0201-0 .0404 5 2.9 4 1.5 
0.0404-0. 103 ll 8.8 3 4.4 
0.103 -0.211 14 14.7 13 7.4 
0.211 -0.446 29 29.4 8 14.7 
0.446 -0.713 26 29.4 13 14.7 
0.713 -1.386 59 58.8 23 29.4 
1.386 -2.408 62 58.8 25 29.4 
2.408 -3.219 19 29.4 19 14.7 
3.219 -4.605 25 29.4 7 14.7 
4.605 -5.991 26 14.7 10 7.4 
5.991 -7.824 9 8.8 4 4.4 
7.824 -9.210 2 2.9 6 1.5 
9.210 - 7 2.9 2 1.5 
Tora. 294 294 147 147 

Mean value of S = 2.18 Mean value of S = 2.16 

P=0.2 P=0.1 
by enumerating n, leucocytes 
Var p = pl — p) (10) 
Na 
and the coefficient of variation of p is given by 


Nap 


In practice of course p will be used as the best available estimate of p 
so that 


vi= (12) 


In routine counts at A.E.R.E., 200 cells are evaluated, so that 
nz = 200, and for a neutrophil count with p = 0.5 the coefficient of 
variation of the neutrophil count is 7%. The determination of the 
proportion of a less common cell is liable to a larger error. A monocyte 
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ratio of 0.1, for example, would be determined with a coefficient of 
variation of 21% by counting 200 cells. 


Error of absolute neutrophil or lymphocyte count 


The absolute neutrophil or lymphocyte count is obtained by multi- 
plying the total white-cell count by the appropriate proportion p found 
in the differential count. 

To a first approximation the coefficient of variation of the product 
of two independent variables is the square root of the sum of the squares 
of the coefficients of variation of the two variables.* It follows from 
equations (9) and (12) that the coefficient of variation of the absolute 
neutrophil count is given by 


2 2 = 


The contributions of the various sources of error, and the reduction 
in error achieved by doubling up the various stages, are shown in 
Figure 2. Each component is expressed as a coefficient of variation, so 
‘tat different components must be summed by squaring, adding, and 
taking the square root. 


Di. 

It is clear from Figure 2 that it is of little use increasing indefinitely 
the number of cells counted in the leucocyte count unless the numbers 
of pipettes and chambers used are also increased. 

Devices, such as electronic cell counters, which enable 10,000 cells 
to be enumerated in less time than 100 can be counted by eye, are not 
sufficient by themselves to reduce the error indefinitely. 

The relatively large contribution of the differential count error is 
evident. Any method of special staining which enables the individual 
cells to be distinguished in the wet count, thus obviating the necessity 
for a differential count, is likely to be capable of greater accuracy. 

Although the errors of the white-cell count are large in comparison 
with most scientific estimations, this is not of great importance if, as 
will be shown in a sequel, random short term physiological variations 
in the white-cell content of human blood, are larger still. 


*The errors of the white cell count and differential count, considered as measurements on a certain 
individual at a certain time must clearly be independent, since the counts are deduced by quite distinct 
methods from different drops of blood. The white cell counts and differential counts of an individual 
at different times, and of different individuals, are not independent. There is a slight positive correla- 
tion between the white cell count and the proportion of neutrophils. 
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ON THE DUAL OF SOME BALANCED INCOMPLETE 
BLOCK DESIGNS 


S. S. SarrkHANDE 
College of Science, Nagpur, India 
and 
University of Kansas, Lawrence 


1. Summary. Purpose of this note is to seas that the existing 
Balanced Incomplete Block Designs with parameters 


(i) bt, r*, k*, = 1 
and 
(ii) | 2 2! 


r* = k, k* =k — 2, A* = 2 


can be dualised to give Partially Balanced Incomplete Block Designs 


with only two types of associates. Easy methods for writing down the 
designs thus obtained are also given. 

2. Definition of a Partially Balanced Incomplete Block Design. 
(P.B.I.B.D.). An incomplete block design is said to be a P.B.I.B.D. 


[1, 2] if it satisfies the following conditions. 


(i) There are v varieties divided into b blocks of k units each, 
different varieties being applied to the units in the same block. 
(ii) Each variety occurs in r blocks. 
(iii) There can be established an association relationship between 
_ any two varieties satisfying the following requirements. 


(a) Two varieties are either Ist, 2nd, --- or m-th associates. 
(b) Each variety has exactly n,; , i-associates (i = 1, 2, --- m). 
(c) Given any two varieties which are i-associates the number 
of varieties common to the j-th associate of the first and k-th 
associate of the second is p},. Also pj, = pi; - 
(iv) Two varieties which are 7-associates occur together i in exactly 
blocks. 


The numbers A; (¢ = 1, 2, --- , m) need not all be distinct. When 


all the numbers \, are equal we - a Balanced Incomplete. Block Design 
(B.I.B.D.). 


3. Dual of a B.I.B.D. Let a B.I.B.D. with parameters v*, b*, e 
66 
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k*, and d* exist. Let the treatments and blocks of the designs be 
denoted by T, , , --- , and B, , B., --- , Bye respectively. Let 
the block B; contain the treatments T;, , T;,, --- T;,. Then a new 
design, not necessarily balanced, can be obtained from this design by 
letting the treatments and blocks of the original design become the 
blocks and treatments respectively of the new design in which the 
treatment numbered 7 occurs in blocks numbered 7; , 72 , --* , te, 
t = 1, 2, --- b*. Hence for the new design v = b*, b = v*, r = k*, 
k = r*. This design is the dual of the original design [1]. ° 

As an illustration consider the design with parameters v* = 6, 
b* = 10, r* = 5, k* = 3, A* = 2. It is given by the following 10 blocks 
given in order. (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 1), (5, 1, 2), (1, 3, 6), 
(2, 4, 6), (3, 5, 6), (4, 1, 6) and (5, 2, 6). The design dual to this is 
given by the following blocks (1, 4, 5, 6, 9), (1, 2, 5, 7, 10), (1, 2, 3, 6, 8), 
(2, 3, 4, 7, 9), (3, 4, 5, 8, 10), (6, 7, 8,9, 10). For this dual design, which 
is no longer balanced, we have v = 10, b = 6,r = 3andk = 5. The 
first block of the dual design contains treatments 1, 4, 5, 6 and 9, for 
the treatment numbered 1 in the original design occurred in blocks 
numbered 1, 4, 5,6 and 9. Other blocks of the dual design are similarly 
formed and in particular the last block of the dual design corresponds 
to treatment numbered 6 of the original design which occurred in blocks 
numbered 6, 7, 8, 9 and 10. 

4. Dual of a B.I.B.D. with \* = 1 


Theorem 1. The dual of a B.I.B.D. with parameters 


be 
(4.1) 


k*=r, 
is a P.B.I.B.D. with parameters. 


bark—-k+1, rar, 


m=rk—-1), 


(4.2) 
Dis = 
(r — 1I)(k 
r r(k — 1) 
Pix = _ kk 


rk—-r—-—1) (k-—n’?+ 1) 
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Proof. Let A be any block of the design (4.1). Since A* = 1 no 
other block of the design can have more than one treatment in common 
with A. Let m, be the number of blocks A, each of which has no treat- 
ment in common with A and m, be the number of blocks A, each having 
one treatment in common with it. Then obviously 


M + m, = (rk — k + 1)k/r 
and 
m, = r(k — 1) 


Hence m = (k — r(k — 1)(r 1)/r. 


In the dual design, therefore, with respect to any given treatment 
the other treatments can be divided into two groups of size n, and n, 
respectively, where n, = r(k — 1) and n. = (k — r)(k — 1)(r — 1)/r 
such that the treatments of the first group occur \,(=1) times and those 
of the second group occur \.(=0) times with the given treatment. 

Let a noninitial treatment (i.e not occurring in A) occur yo times 


Ay and y, times in A,. Then 


Ytyn=k 
Hence 
Y=k-r 


Let B be a block of A,. Let there be z blocks B, and z, blocks B, 
in Ay. Then z + z, = (k — r)(k —.1)(r — 1)/r -1. 

Since each of the r noninitial treatments of B must occur further 
k — r — 1 times in the remaining blocks of A, and the k — r — 1 blocks 
of A, containing a given treatment of B must be different from the 


k —.r — 1 blocks corresponding to any other treatment cf B we must 
have 


= r(k —r — 1) 
Hence 
= (k— 1)? + — 1) kk 


Let C be a block of A, . Let A» contain u, blocks C, and u, blocks 
C,. Then 


_ k= — 1) 
i 


Uo + Uy 


and 
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=(r— Dik 
since in A, each of the noninitial treatments must occur (Kk — r) times. 
Hence u, = (k — r)(r — 1)(k — r — 1)/r. 


For the dual design the blocks A and B become 2-associates and 
the number of treatments which are simultaneously 2-associates of these 
two treatments of the dual design is obviously the number z, of blocks 
B, each of which has no treatment in common with both A and B. 
Thus p32, = (k — r)? + 2(r — 1) — [k(k — 1)]/r. Similarly for the 
dual design the number of treatments which are 1-associates of the 
treatment corresponding to A and simultaneously 2-associates of the 
treatment corresponding to B is the number of blocks B, each of which 
has no treatment in common with A and simultaneously one treatment 
in common with B. Thus pi2 = r(k — r — 1). 

It can similarly be verified that the values of the remaining param- 
eters are as indicated in (4.2). Thus the dual design is a P.B.I.B.D. 
with just two types of associates. 


Corollary. The dual of an unreduced B.I.B.D. 
=n, b* = n., , ré=n-—1, k* = 2, = 1 


is a P.B.I.B.D. with parameters. 


b=n, r= 2, k=n-—1, = 1, A. = 0 
m=2n—2), m=(n— 
n-2 2-38 2(n — 4) 
Di = = 
n-3 (n—8)., (n— 4), 


Most of the designs obtained by the above theorem can be easily 
written by developing cyclically an initial set of difference sets [3] with 
respect to a suitable modulus. 

Theorem 2. Let ((a,, (a,,-)), abe (a2,-)), 

((@m.1), *** » (@m.r)), Where each (a;,;) is one of the numbers 1, 2, 
v, form a set of m initial difference sets (mod = v) for the B. I. B. D. 
with v* = v, b* = m, r* = mr, k = r, * = 1. Then the difference 
set ((@1.1)s (m1) m developed (mod v), 
keeping the lower suffixes fixed, gives the P.B.I.B.D. dual to the above 
design. 

Proof. The design obtained by developing the latter difference set 
will be dual to the B.I.B.D. given by the set of m difference sets if 
any two blocks of the dual design have just one treatment in common. 
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It is obvious that this condition is satisfied in virtue of the original set 
giving rise to the B.I.B.D. 

As an illustration consider the design v* = 13, b* = 26, r* = 6, k* = 3, 
Ae = It is obtained by cyclically developing the two difference sets 
(1, 3, 9) and (2, 6, 5) mod. 13. The set (1, 3, 9) gives rise to 13 blocks 
by adding the numbers 0, 1, 2, --- , 12 respectively and reducing any 
number which exceeds 13 by subtracting 13 from it. Thus the blocks 
obtained by adding 10 to the difference set (1, 3, 9) gives (11, 13, 19) 
which when reduced mod. 13 is (11, 13, 6). Hence the blocks obtained 
from the set (1, 3, 9) are (1, 3, 9), (2, 4, 10), (3, 5, 11), (4, 6, 12), (5, 7, 13), 
(6, 8, 1), (7, 9, 2), (8, 10, 3), (9, 11, 4), (10, 12, 5), (11, 13, 6), (12, 1, 7), 
(13, 2, 8). Similarly the set (2, 6, 5) gives rise to another set of 13 
blocks. These 26 blocks are the blocks of the above balanced design. 
The dual design is given by the difference set (1, , 3, , 9; ; 22 , 62 , 5s), 
which is to be developed cyclically mod. 13, keeping the lower suffixes 
fixed. Thus the 13 blocks obtained from this set are (1, , 3; , 9, ; 22, 6, 
52) (2, , 41 , 10, ; 32, 72 , 62) (3: , 5, , 42, 82, 72) (41, 61, 12, 
52 , , 82) (5, , 7, , 13; ; & , 10, , 92) (6, , 8 , 1,372, 112, 
10, , 1, 13.) (10, 12, 11, ’ 22 ’ 1,) (il, ’ 13, ’ 6: ; 12, , 32 22) 
(12, , 11, 71 3 13, , 43 , 3a) (13, , 2, , 8, 1, , 53 , 49). Wecan identify 
the 26 different symbols occurring here with the 26 treatments of the 
dual design in the following manner. Put z, = x and z, = 13+ 2 
for x = 1, 2, --- 13. Thus the symbol 9, stands for the treatment 
numbered 9, whereas 9, stands for treatment numbered 22. With this 
identification the above 13 sets give the 13 blocks of the dual design. 

5. The dual of B.I.B.D. with r* = k, k* = k — 2, »* = 2. 
Theorem 3. The dual of the B.I.B.D. with v* = (k — 1).,,6* = k.,, 
r* = k, k* = k — 2, \* = 2is a P.B.I.B.D. with parameters 


v=k,, b=(k-1.., r=k-2, k=k, 
4, = 1, A. = 2, nm, = 2k — 2), NM, = (k — 2).,, 


Dii = ’ Dii = 
k-3 (k- 4), 


Proof. The theorem can be proved in the same way as Theorem 1 
by using results due to Hussian [4]. 

It is interesting to note that the above P.B.I.B.D. can also be 
obtained by omitting all the blocks containing any particular treatment 
from the B.I.B.D. with 
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2 ‘ 
= r* = k* =F = 2. 

6. Table of Designs obtained. Solutions of designs obtained in this 
paper are given in a tabular form. Some of the difference sets used 
are obtained from those given by Rao [5]... In developing these difference 
sets the lower suffix is being kept constant. For those designs which 
are not explicitly given i.e. (4) thru (13) and (18) thru (23), use has 
been made of the corresponding balanced designs given by Fisher and 
Yates [6]. 


Explanation of the table. The object of this table is to give the designs 
in as compact a form as possible. Design (4) of the table is the dual 
of the design v = 9, b = 12, r = 4, k = 3 and \ = 1 and could there- 
fore, have been obtained from it but it is easily shown that the same 
design is obtained by omitting a block from the design »v = b = 13, 
r = k = 4, = 1. The same is true for designs (5) thru (9) and (21) 
thru (23). For designs (10) thru (13) it has not been possible to put 
the solution in a more compact form than is given there. Designs (1) 
thru (3) and (14) thru (17) are obtained by developing cyclically 
keeping the lower suffixes. This development is explained in the ex- 
planation of Theorem 2. In general the problem of identification can 
be explained as follows. If we have a difference set mod. p in which 
there are q suffixes, we get in all pq different symbols after cyclic de- 
velopment. Put z, = (y lp +2;y = 1,2,--+ g. Thus we have 
treatments numbered 1, 2, --- pq corresponding to these different 
symbols. 
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TABLE 1 
v, b, Ar, As 5 1, Ms; 
Pia » Pia » Pao Pin » Pia» Solution 
(1) 26, 13, 3, 6; 1, 0; 15, 10; Develop (mod 13) 
8, 6, 4; 9, 6,3 (11, 31, 91 5 22, 62, 5s) 
(2) 57, 19, 3, 9; 1, 0; 24, 32; Develop (mod 19) 
11, 12, 20; 9, 15, 16 (11, 71, 11: ; 22, 32, 142 ; 43, 63, 9s) 
(3) 82, 41, 5, 10; 1, 0; 45, 36; Develop (mod 41) 
24, 20, 16; 25, 20, 15 (11, 10: , 161 , 18: , 371 5 52, 82, 92, 212, 392) 
(4) 12, 9, 3, 4; 1, 0; 9, 2; Omit blocks containing treatment 13 from the 
6, 2, 0; 9, 0, 1 designy =b=13,r =k =4,\=1 
(5) 20, 16, 4, 5; 1, 0; 16, 3; Omit blocks containing treatment 21 from the 
12, 3, 0; 16, 0, 2 design» = b = 21,r =k =5,\ = 1. 
(6) 30, 25, 5, 6; 1, O; 25, 4; Omit the blocks containing treatment 31 from 
20, 4, 0; 25, 0, 3 the design v = b = 31,r = kK = 6, = 1. 
(7) 56, 49, 7, 8; 1, 0; 49, 6; Omit the blocks containing treatment 57 from 
42, 6, 0; 49, 0, 5 the design v = b = 57,r =k =8, X= 1. 
(8) 72, 64, 8, 9; 1, 0; 64, 7; Omit the blocks containing treatment 73 from. 
56, 7, 0; 64, 0, 6 the designy = b= 73,r=k=9,X=1 
(9) 90, 81, 9, 10; 1, 0; 81, 8; Omit the blocks containing treatment 91 from 
22, 8, 0: 81, 0, 7 the design» = b = 91,r =k = 10,A = 1. 
(10) 35, 15, 3, 7; 1, 0; 18, 16; Dualise the design 
9, 8, 8; 9, 9, 6 v= 15,6 = 35,r=7,k =3,\=1 
(11) 70, 21, 3, 10; 1, 0; 27, 42; Dualise the design 
12, 14, 28; 9, 18, 23 v= 21,b = 70,7, =10,k =3,A=1 
(12) 50, 25, 4, 8; 1, 0; 28, 21; Dualise the design 
15, 12, 9; 16, 12,8 v= 25,6 = 50,r=8 k=4,r+;\=1 
(18) 63, 28, 4, 9; 1, 0; 32, 30; Dualise the design 
16, 15, 15; 16, 16, 13 v= 28,b6=63,r=9,k=4,\=1 
(14) 10, 5, 2, 4; 1, 0; 6, 3; Develop (mod 5) the block 
3, 2, 1; 4, 2,0 (1: , 41, 22, 32) 
(15) 21, 7, 2, 6; 1, 0; 10, 10; Develop (mod 7) the block 
5, ? 34, (li , 61, 22, 52, 33, 43) 
(16) 36, 9, 2, 8; 1, 0; 14, 21; Develop (mod 9) the block 
7, 6, 15; 4, 10, 10 (1: , 81, 22, 72, 33, 63, 44, 5) 
(17) 55, 11, 2, 10; 1, 0; 18, 36; ‘Develop (mod 11) the block 
9, 8, 28; 4, 1 14, 21 (11, 10: , 22, 92, 38, 83, 46, 7a, 5s, 6s) 
(18) 15, 6, 2, 5; 1, 0; 8, 6; Dualise the design 
4, 3,3; 4,4,1 
(19) 28, 8, 2, 7; 1, 0; 12, 15; Dualise the design 
6, 5, 10; 4, 8, 
(20) 45, 10, 2, 9; 1, 0; 16, 28; Dualise the design 
8; 7, 21; 4, 12, 15 v=10,6 = 45,r=9,k=2,A,=1 
(21) 10, 6, 3, 5; 1, 2; 6, 3; Omit the blocks containing treatment 11 from 
3, 2, 1; 4, 2,0 the dessigny = b=llr=k=5,\=2 
(22) 15, 10, 4, 6; 1, 2; 8, 6; Omit the blocks containing treatment 16 from 
4, 3,3; 4, 4,1 the designy = b= 16,r =k =6,4\ =2 
(23) 36, 28, 7, 9; 1, 2; 14, 21; Omit the blocks containing treatment 37 from 
7, 6, 15; 4, 10, 10 


the design» = b = 37,r =k =9,X4 = 2 
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LATINIZED RECTANGULAR LATTICES 
Boyp HarsHBARGER AND Ly e L. Davis 


Virginia Agricultural Experiment Station of the 
Virginia Polytechnic Institute 


INTRODUCTION 


 gpaaneng-soenen Lattices, Triple Rectangular Lattices and Near Bal- 
ance Rectangular Lattices were introduced by Harshbarger (1947, 
1949, 1951). Other recent studies on the Rectangular Lattice designs 
are included among the references. These designs treat specifically 
the cases where the number of varieties or treatments are the product 
of two consecutive integers, k(k — 1), but they can be extended to 
other cases where the integers are not consecutive. The Near Balance 
Rectangular Lattice Designs extend the Rectangular Lattice Designs to 
the case of k replications. This extension causes the resulting formulas 
for those of the Near Balance Rectangular Designs to be of less com- 
plexity and at the same time more efficient than the earlier designs. 
It includes as with the earlier designs the recovery of interblock in- 
formation. The time for calculations in the latest design is much 
reduced. 

To construct the Near Balance Rectangular Lattice designs start 
with a set of k — 1 mutually orthogonal k X k Latin squares, in which 
the first column is in the standard order, e.g. for k = 4 


ABCD AC DB ADBC 
BADG a2 
C DAB CABD CBODA 
DCBA DBAC DACB 


73 


2) 
ae 
ag 


74 BIOMETRICS, MARCH 1952 


The first set of blocks is obtained by writing down the k(k — 1) treat- 
ments in any manner in a k X (k — 1) rectangle, e.g. 


4 6 5 
9 7 8 
12 11 10 


Now cancel the first column from the three Latin squares, and 
superpose the rectangle of treatments on the resulting Latin rectangles. 
The blocks of the second set are the treatments falling on the letters 
A, B, C, D respectively of the first Latin rectangle and so on for the 
other sets. The resulting design which is used in this paper is 


TABLE I 
: &¢ 4 7 10 5 9 ll 6 8 12 
& & @ 1 8 ll 3 7 12 29 10 
9 7 2 5 12 1 6 10 3 4 Il 
12 11 10 36 9 24 8 16:7 


This method of construction is quite general and a further advantage 
is that the columns of the first set give the treatments which do not 
occur together in a block. In applying the design the treatments are 
assigned at random to the numbers. 


CONSTRUCTION OF LATINIZED RECTANGULAR LATTICE DESIGNS 


The analysis of variance for the Rectangular Lattice Designs sepa- 
rates the variability not attributable to error into two assignable causes, 
that for replications and that for varieties. The block effects are con- 
ceived of as random variables and are entangled with variety effects. 
The Latinized Rectangular Lattice is a special case of the Near Balance 
Rectangular Lattice Design which allows a third class of assignable 
effects to be estimated and tested provided that the interactions with 
treatments are negligible. 

If the blocks are systematically arranged in rows and sets as in 
Table I with each row extending across all sets, it can be seen that 
each variety is represented once and once only in each row as well as 
in each set. A row will thus give an estimate of an additional assignable 
cause which is orthogonal to the between set variation. In general, a 
Latinized Rectangular Lattice Design of k(k — 1) varieties will have 
k rows and k sets which are orthogonal to one another like the rows 
and columns of a Latin square. Each row and each set contains k in- 
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complete blocks of k — 1 varieties each. The design provides estimates 
of the variations due to sets, rows, interaction of rows by sets, varieties, 
and error. 

In the Analysis of Variance the between set or between replication 
term is not changed from the Rectangular Lattice and is estimated by 
k — 1 degrees of freedom. The between blocks term having k(k — 1) 
degrees of freedom in the Rectangular Lattice Designs, however, has 
now been separated into two components, between rows with k — 1 
degrees of freedom and the interaction of rows with sets having (k — 1) X 
(k — 1) degrees of freedom. The between rows and between sets 
variances are each independent of the variances between varieties and 
only the row by set interaction is confounded with this term. The 
interaction variance with varietal effects eliminated is easily obtained 
by the usual method employed in lattice designs. 


ANALYSIS OF LATINIZED RECTANGULAR LATTICE DESIGNS 


The mathematics for these designs is similar to that used in the 
development of the Rectangular Lattices. In many problems it is more 
realistic to use only an intra-block error for a test of significance and 
the correction of the varieties. In other cases where randomization can 
be assumed for either the row or set effects, it would be reasonable to 
use the row by set interaction to give an estimate of an inter-block error. 

The added classification of the Latinized Rectangular Lattice De- 
sign makes it applicable to a wide scope of experimentation ranging 
from taste-testing and biological experimentation to industrial experi- 
mentation. 

The symbols used in the discussion are defined. as follows: 


. R, is the sum of readings of the varieties for set h. 

2. B,; is the total of the readings of the varieties from the incomplete 
block 7 of set h. 

3. T,; is the sum of the readings from all sets of the varieties listed in 
incomplete block 7 of set h. 

4. V; is the sum of the readings from all sets of the variety with sub- 
script 7. 

5. Gis the grand total of the readings for all the varieties. 

6. y; is the reading of a variety with subscript j for a particular set 
and a particular incomplete block. 

7. k is the number of incomplete blocks in a set. 

8. S,, is the sum of variety estimates in set h and incomplete block 7. 


The Analysis of Variance for the Latinized Rectangular Lattice 
Design is given in Table III. 
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ESTIMATION OF THE INTRA-BLOCK ERROR VARIANCE AND THE 
CALCULATION OF ADJUSTED VARIETY MEANS 

An estimate of the intra-block error variance which may be denoted 
by Q, is obtained from item (6) in the Analysis of Variance Table. 

The adjusting of the variety means using intra-block information 
is accomplished by calculating certain constants. These constants, for 
which the formulas are given in Table IV, are then subtracted from 
the variety means. If the variety means are arranged in the order of 
set 1, the first group of constants are subtracted from the means ac- 
cording to rows and the second group according to the sets. Then the 
variety means adjusted in this way are arranged as in set 3 and the 
constants of the third group are subtracted according to rows. After 
this adjustment the variety means are arranged as in set 4 and are 
adjusted according to rows by the constants of a fourth group. This 
procedure of adjusting variety means is continued until all the sets and 
groups of constants have been used. The means appearing in the last 
set are the completely adjusted variety means. 

Another method is to list the variety means and then subtract from 
each the sum of the constants (C,,) for the set and row to which the 
variety belongs. 


TABLE IV 
CONSTANTS 


Group I 


TESTS FOR DIFFERENTIATION BETWEEN VARIETIES 


A test of significance for the overall differentiation between varieties 
can be made by using the adjusted sum of squares for blocks (4) and 
the sum of squares from the varieties (5) which is unadjusted, together 
with the unadjusted sum of squares for blocks. This latter is calculated 
by 


} > 
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1 
L> Bi: (8) 
Then the adjusted sum of squares of the varieties is given by 
(4) + (5) — (8). (9) 


The mean square of (9) can be compared with the error term for a test 
of significance between varieties. 

For this design there are two simple formulas for estimating the 
standard error depending upon whether two varieties appear together 
in any one incomplete block or do not appear together. 

The estimated standard error of the difference between the means 
of two varieties occurring together in an incomplete block is 


2Q [ k—1 | 

1+ — 2) (10) 
The estimated standard error of the difference between the means of 

two varieties not occurring together in an incomplete block is 


E + (11) 


The two above formulas can be combined with appropriate weights 
to give the average estimated standard error which is 


NUMERICAL ANALYSIS 


To illustrate numerically the method of analysis for the Latinized 
Rectangular Lattice Designs, the analysis of an experiment measuring 
the color intensities of blends* of apple sauce is presented. 

Since twelve blends of apple sauce were to be tested in the experi- 
ment, a4 X 3 Latinized Rectangular Lattice was used. The experiment 
was set up as in Table I. The blends were randomized within each 
incomplete block and the incomplete blocks were randomized subject 
to row and set restrictions. 

The color intensities were measured with a Photovolt Reflectometer 
and the readings in light units expressed on the machine were tabulated 
and compiled for computational purposes as shown in Table VI. The 


*The names of the blends used in these studies may be seoured by writing to the Fruit and 
Vegetable and Byproducts Laboratory, Virginia Agricultural Experiment Station, Blacksburg, Virginia. 
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upper figure in each cell refers to the number corresponding to the 
blend, while the lower figure is the light intensity. 

In this experiment the sets were subjected to different times of 
storage and the new feature of the design was employed by using the 
rows to measure the effects of different concentrations of cinnamon as > 
shown in Table V. 


TABLE V 
STORAGE TIMES AND CINNAMON CONCENTRATION 
Storage — Cinnamon 
Time Concentration 
Set (hours) Row (gms. /500 gms.) 
1 0 1 0 
2 24 3 1.25 
3 48 3 2.50 
4 72 4 3.75 


The totals of the readings for each of the blends are arranged in 
Table VII following the order of the readings as listed in Table VI 
respectively. 

The calculations for the analysis of variance follow from the formulas 
given in Table III. In order to compute the sum of squares for rows 
and the interaction, it is convenient to form Table VIII and Table IX 
showing the row totals. 


TABLE VI 
READINGS OF COLOR INTENSITY BY COLUMNS (REPLICATIONS) 


CoLumn 1 
Blocks Bu 
1 2 3 
(1) 15.5 15.0 - 16.0 46.5 
4 6 5 : 
(2) 11.5 13.5 17.0 42.0 
9 7 8 
(3) 16.5 15.0 12.0 43.5 
12 11 10 
(4) 10.0 12.0 13.0 35.0 
Tora. (R,) 167.0 
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CoLumn 2 
Blocks Bx 
4 7 10 
(1) 22.5 19.5 17.5 59.5 
1 8 11 
(2) 14.0 15.0 13.0 42.0 
2 5 12 
3) 12.5 15.0 11.5 39.0 
3 6 9 
(4) 10.0 11.5 15.0 36.5 
Tora. 177.0 
Co.umn 3 
Blocks By 
5 9 1l 
(1) 21.5 22.5 16.5 60.5 
3 7 12 
(2) 12.5 16.0 12.0 40.5 
1 6 10 
(3) 13.0 13.0 13.5 39.5 
2 4 8 
(4) 11.0 12.5 11.0 34.5 
Tora. (Rs) 175.0 
CoLumn 4 
Blocks Be 
6 8 12 
(1) 16.5 15.0 14.5 46.0 
2 9 10 
(2) 13.5 19.0 12.5 45.0 
3 4 11 
(3) 10.0 15.0 10.0 35.0 
1 5 7 
(4) 10.5 12.5 12.5 35.5 
Tora. (R,) 161.5 
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TOTALS OF READINGS OF COLOR INTENSITY 


TABLE VII 


Blocks Arranged as in set 1, Table VI Tx 
1 2 3 
(1) 53.0 52.0 48.5 153.5 
4 6 
(2) 61.5 54.5 66.0 182.0 
4 7 . 
(3) 73.0 63.0 53.0 189.0 
12 11 10 
(4) 48.0 51.5 56.0 156.0 
Tora. 680.5 
Blocks Arranged as in set 2, Table VI Tx 
4 7 10 
(1) 61.5 63.0 56.5 181.0 
1 8 11 
(2) 53.0 53.0 51.5 157.5 
2 5 12 
(3) 52.0 66.0 48.0 166.0 
3 6 9 
(4) 48.5 54.5 73.0 176.0 
Tora. 680.5- 


Arranged as in set 3, Table VI 
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= 
: 
Blocks Tx 
5 9 11 
(1) 66.0 73.0 . 51.5 190.5 ie. 
(2) 48.5 63.0 48.0 159.5 a 
A 6 10 
(3) 53.0 54.5 56.5 164.0 4a 
2 4 8 pe 
(4) 52.0 61.5. 53.0 166.5 ae 
ToraL 680.5 
: 
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TABLE VII—Continued 


Blocks Arranged as in set 4, Table VI Ti 
6 8 12 
(1) 54.5 53.0 48.0 155.5 
2 9 10 
(2) 52.0 73.0 56.5 181.5 
3 4 ul 
(3) 48.5 61.5 51.5 161.5 
1 5 7 
(4) 53.0 66.0 63.0 182.0 
Tora. 680.5 
TABLE VIII 
— Tas 
i 4B,; Tx 4B;; Tx 4B;; = Tx 4By TK 
1 32.5 57.0 51.5 28.5 
2 10.5 2.5 
3 -15.0 -10.0 - 6.0 —21.5 
4 -16.0 —30.0 —40.0 
TABLE IX 
ROW TOTALS 


(By + Bax + Ba + Bu) 


4 Total 
1 212.5 
2 169.5 
3 157.0 
4 141.5 


The analysis of variance table for the experiment is shown in Table X. 

By (9) the adjusted sum of squares for varieties is 115.158. 

In adjusting the variety means, the correction terms are calculated 
from Table IV. The correction terms are listed in Table XI. The un- 
adjusted variety means, the appropriate corrections applied to each, and 
the resulting adjusted variety means are shown in Table XII. 
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_-- =x 
ANALYSIS OF VARIANCE OF A LATINIZED RECTANGULAR LATTICE EXPERIMENT 
Source of Degrees of | Sum of Mean 
Variation Freedom Squares Squares 
Sets (storage time) 3 12.93 4.311 
Rows (levels of cinnamon) 3 232.31 77.436 
Interactions Rows by Sets 9. 22.02 2.446 
Blocks (adjusted) 15 267 .26 17.817 
Varieties (unadjusted ll - 159.81 14.528 
Error 21 28.18 1.342 (Q) 
Total : 47 455.25 


The error mean square 1.3420 is an estimate of Q, the intra-block 
variance. 

The estimated standard error of the difference between the means 
of the two varieties 

(1) occurring in an incomplete block is 


(2) not occurring together in an incomplete block is 


E = 1.0032 


(3) and the average estimated standard error is 


E + = 9724 


TABLE XI 
CORRECTION TERMS 


4, 
| 
| 
1 
° 
i Cis Cx Cx Cu 
500 04 
1 1.016 1.781 1. 891 a 
938 88 AS 
2 -. .078 — .047 
2 
3 —.312 — .672 
4 bi —.891 —1.250 
| 
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TABLE XII 
UNADJUSTED AND ADJUSTED VARIETY MEANS 


Variety Mean _ Adjustment = Adjusted Mean 
Vi 13.250 - Cn - Cx Css = Cu 13.344 
V2 13.000 Ci Cz — Cu - Ca = 13.234 
Vs 12.125 Cu Cu Cu - ¢ 4a = 12.641 
Vs 15 ° 375 Cu Cas = 15 595 
Vs 16 500 Ci Cas = Cu Cu = 16 891 
Ve 13 625 = Cis Cu Cas Cc = 14. 298 
15.750 Cx Cn Cn Cu = 15.610 
Vs 13 250 Cis Ca Cu Cu = 13 391 
Vo 18.250 —- Cs — Cu Cn Cn = 18.095 
Vio 14 125 Cu Cu Css Ca = 13 079 
Vu 12 875 Cus Ces 81 Cas = 12 10 
Viz 12. 000 Cu Cs: Ca = 843 
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QUERIES 
Grorcs W. Snepecor, Editor 


QUERY: In order to determine the precision of a chemical 
92 assay method as, for instance, a hormone determination in urine, 

it is a common practice to perform duplicate analyses on the same 
urine specimen. The individual determinations in each pair of duplicates 
have no order in time and are indistinguishable with respect to any other 
classification if they are done by the same operator, on the same day, 
with the same technique. Consequently, no sign can be attached to 
duplicate differences, they have all to be taken as absolute (positive) 
values. 

How should such duplicate differences be used to characterize and to 
test the precision of the method? 

The (to me) obvious Null hypothesis that the mean difference = 0 
cannot be tested by ordinary procedures, since all the differences are 
greater than zero by definition. When plotted in a histogram, the 
distribution of duplicate differences looks like the positive tail of a 
normal distribution. If the actual data could be shown not to contradict 
this “positive normal tail” hypothesis, would this be equivalent to the 
(desired) statement that “duplicates don’t differ significantly” (at a 
certain specified probability level)? And how could it be shown? 
Alternatively, would it be reasonable to ask for tests of significance for 
the difference between a sample mean (of positive duplicate differences) 
and the one-tail mean of the normal distribution? Or, finally, are these 
questions unnecessary complications of the real issue, to wit, using 
S(d’)/2N as an estimate of the population variance within duplicates— 
a procedure which has to be used anyway when dealing with triplicates 
or higher replicates? 

To characterize further the kind of data I have in mind, I should 
perhaps add, 1. that 20-50 pairs of duplicates are plenty, 2. that I am 
only concerned with the reproducibility or results, not with their 
accuracy; thus, the mean of each pair of duplicates is irrelevant (pro- 
vided, of course, that there is no correlation between the means, and the 
differences, of duplicates). 

I have used the terms accuracy, precision, and reproducibility (in- 
stead of repeatability), in the sense of Cochran & Cox’s explanation in 
Experimental Designs (New York, 1950, p. 16). 
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I shall assume that each pair of determinations is a random 
ANSWER: sample from a normal distribution with mean y and stan- 

dard deviation ¢. Furthermore, I assume that o is the 
same for all pairs but that « may characterize the individuals, being 
different for all or any number of them. This implies, as you say, that 
there is no correlation between » and ¢. Finally, for illustration, I shall 
assume that you have 35 pairs of determinations. In these circumstances, 
I should calculate the analysis of variance: 


Source of Variation Degrees of Freedom Mean Square 
Determinations ......... 35 D 


The mean square D is an estimate of o’, the common variance of the 
determinations on all individuals. I would think of 1/o as a measure of 
the precision of the assay method. There is no way to test the precision 
unless you have some hypothesis about ¢; such, for example, that o is not 
greater than some specified number. With the estimate of o”, you can 
set a confidence interval on yu for any individual. Or, you can calculate 
the number of determinations necessary to keep z — y» within specific 
bounds. 

From my viewpoint, the differences between determinations are 
irrelevant except as a computational device; D may be calculated, as 
you say, by S(d’)/2N. 


QUERY: A question which comes up frequently in radiation 


93 experiments concerns the estimation of the ratio of the spontane- 


ous to the induced mutation rate in a particular species. If p, is 
the probability of occurrence of a mutation in a control animal and p, 
is the probability of a mutation in an irradiated animal, the ratio in the 
population is defined as 


Ps — Pr 
For the most part, the occurrence of a mutation is a rare event, so that 
many animals must be examined in order to identify even a few mutants. 


In this special case how may one place confidence limits on R given 
estimates p, and 


+ 
} 
| 7 
ek 


QUERIES 87 


If x and y represent the outcomes of the examinations of 
ANSWER: a control animal and a treated animal, respectively, then 


the four possible outcomes are 
(z, y) = (0, 0) 
(z, y) = (0, 1) 
(x, y) = (1, 0) 
(z, y) = (1, 0), 


where zero and one indicate absence and presence of a mutation. Since 
outcomes (0, 0) and (1, 1) provide little information about the difference 
in mutation rates, let us consider only the outcomes (0, 1) and (1, 0). 
We have | 


Prob. {(0, 1)} = (1 — pip 
Prob. {(1, 0)} = — 


If there are ¢, outcomes of type (0, 1) and ¢, outcomes of type (1, 0), we 
can construct a new binomial population with parameter 


(1 — pi)pe 
(1 — p,)p2 + — pz)’ 
and can consider the ¢ = ¢, + t outcomes as a sample of size ¢ from the 
binomial population so defined. By means of existing tables, we can 
place confidence limits on p, and from these we can derive confidence 
limits for R with no difficulty. Neglecting the term p,p. which for all 
practical purposes is zero, we have 


_1+R 


| P 1+2R 
from which we find 
R= 


By means of this formula confidence limits for p are easily converted into . 
limits for R. 

These results are valid under the assumption that repetitions of the: 
experiment would consist of taking paired observations until the total 
number of mutations reaches ¢. The formulas are approximately correct. 
for the case in which observations are not taken in pairs, providing p, 
and p, are very small. Since this is true of the situation described. in the; 
query, these formulas may be used. 
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In one experiment about 20,000 animals in each group were examined 
resulting in one mutation in the control group and 14 mutations in the 
treated group. Thus 


t, = 14 
t= 1 
= 14/15 
95 per cent conf. limits on p = .681, .9983 (Fisher and Yates, Table 


VIII, 1) 
95 per cent conf. limits on R = .0017, .88 
A. W. KIMBALL 
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. ABSTRACTS 
THE BIOMETRIC SOCIETY—BRITISH REGION 
Abstracts of papers for meeting on Thursday, 29 November 1951 


162 D. H. CHITTY. Assumptions in the capture-recapture method 
of estimating populations. 

Estimates of numbers in a natural population may be obtained 
through marking, releasing and resampling. Two types of error in this 
method are the statistical errors of estimation and those arising from the 
behaviour of the animals themselves. This paper deals with the latter 
and shows; with special reference to small mammals, that the sample 
may be non-random and that death rates may not be constant, inde- 


pendent of age or independent of the presence of the mark of identifica- 
tion. 


163 K.D. TOCHER. Tests for restraints on sets of linear regression 
lines. 


This paper discusses the possible relations between a set of regression 
lines of the form E(y) = mx + c,; (¢ = 1, 2,...p). The case of linear 
relations can be tackled by the analysis of variance. Well-known 
examples of this are: (a) m; = constant, (b) c; = constant (constant 
intercept on y axis). The main purpose of the paper is to indicate a 
method of adapting the analysis of variance test to tests of non-linear 
relations. The example used is that of constructing a test of the hy- 
pothesis that intercepts on the x axis are co-incident; the significance 


level obtained from the test is an upper bound to the probability of 
error of the first kind. 


164 F. YATES. Trials of coffee progenies. 


The arrangement of progeny trials of such plants as coffee in which 
vegetative production is not practicable is discussed. The general . 
problem is to select the best progenies and the best individual plants 
within progenies. Estimates are also required of the genetic variance 
within progenies for the various groups of progenies, and also of the 
genetic variance between progenies. This is made possible by the exist- 
ence of plants from duplicated haploid homozygous lines which are 


similar in habit of growth and yield characteristics to the lines to be 
tested. 
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Abstracts of papers presented at meeting held jointly with the American 
Association for the Advancement of Science and the American Society of 
Naturalists, Philadelphia, Pa., Dec. 27-28, 1951. 


MARDELLE L. CLARK and FRANCIS X. LYNCH (Armed 
165 Forces Institute of Pathology, Washington, D. C.). Incidence 

and Severity of Atomic Bomb Injuries in Relation to Distance 
From Explosion and Type of Protection: A study of the data from 6343 
case histories from Hiroshima. 


This report is based on case histories of 6343 injured and uninjured 
persons who were within 4000 meters (13,120 feet) of ground-center of 
the atomic bomb in Hiroshima, Japan. Of these, 1207 persons were not 
injured. The incidence and severity of injuries of the 5136 victims who 
received medical attention or treated themselves are studied in relation 
to distance from ground-center and type of protection or lack of pro- 
tection (outdoors, shielded or unshielded, and indoors in buildings of 
light or of heavy construction.) The preponderance of mechanical 
injuries among victims who were indoors and the absence from the 
sample of many others so injured who were trapped and consumed by 
fire are-considered in conjunction with published statements by certain 
qualified architects and engineers that resistance to blast of American 
residences in general would not differ markedly from those in Hiroshima 
and Nagasaki. Since even minor mechanical injuries sustained indoors 
may prevent escape from the disaster area, careful consideration should 
be given to the hazards attendant upon collapse of buildings in the event 
of an atomic attack, with instruction and training in proper methods of 
self-protection out of doors, with a view to reducing the number of 
mechanical injuries which in conjunction with fire, constitute the great- 
est single hazard of atomic attack. Training in outdoor protection 
would at the same time minimize the risks of flash burns which are less 
disabling insofar as escape from the disaster area is concerned. Radia- 
tion hazards are great both indoors and outdoors in the zones of instan- 
taneous gamma radiation, but the delayed effects of radiation should 


enable the victim to escape to secondary zones where medical attention 
may be obtained. 


EASTERN NORTH AMERICAN REGION 


Abstracts of papers presented at meeting held jointly with The Biometrics 
Section of the American Statistical Association and The Institute of 
Mathematical Statistics, Boston, Mass. Dec. 27-29, 1951. 


JOSEPIL BERKSON, M.D., D.Se., and ROBERT P. GAGE, 
166 M.S. (Division of Biometry and Medical Statistics Mayo Clinic, 


Rochester, Minnesota.) Survival Curve for Cancer Patients 
Following Treatment. 
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The survivorship curve of patients who have been treated for some 
specified cancer is represented by the model equation 


ly = clo + (1 — chlolig (1) 
where /, is the probability of survival in the total population of treated 
patients; c is the fraction of this population “cured,” this portion of the 
population being subject, with passage of time, to successive “‘normal’’ 
death rates gg = 1 — pp , from diseases other than the treated cancer; 
lo = |] » is the probability of survival, in a population free of the 
specified cancer; (1 — c) is the fraction of the population not cured, this 
portion of the population being subject to the probabilities of death q , 
and also g.. = 1 — p.., probabilities of death from the specified cancer; 
l.. = TI] Peo is given by the function /,, = e~°', where t is the time after 
treatment, and 8 is a parameter. 

The values for /, are obtained from published mortality tables apply- 
ing to the general population. The equation (1) therefore contains only 
two adjustable parameters: c, the fraction of treated patients “cured,” 
and 8, which is the net force of cancer mortality in the uncured portion 
of the treated population. These are estimated by a least-squares pro- 
cedure. From (1) the expectation of life is calculated and compared 
with normal expectation. The values of c, 8, and expectation of life in 
comparison with normal, are used as measures of the effectiveness of 
therapy. 

The fit of equation (1) was illustrated by application to two series of 


patients who had been treated at the Mayo Clinic for cancer of the 
breast. 


Abstracts of papers presented at meeting held jointly with The Institute of 
Mathematical Statistics, Virginia Polytechnic Institute, Blacksburg, Vir- 
ginia, March 19-21, 1952. 


A. R. SEN (North Carolina State College). Linear Unbiased 


167 =‘ Estimates in Probability Sampling and the Role of Optimum 
Estimates. 


In an earlier paper presented at Minneapolis (abstract in Econo- 
metrics 20, No. 1, 1952), the author developed the theory and application 
for the selection of two primary sampling units (p.s.u.’s) with probability 
proportional to some measure of their sizes (p.p.s.) in the estimation of 
farm characteristics of North Carolina. The present paper treats linear 
unbiased estimates with the following probability schemes: _ 

(i) Symmetric probabilities in which the probability of selecting the 

two units does not depend on the order of selection. The prob- 
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ability functions in this case are proportional to non-negative 
functions of this type 


_ 1 1 
(X; + Xj), xx] x X; + XX; etc., 


X; , X; being some measures of the sizes of the two units and 
X= 
‘ 


(ii) Asymmetric probabilities in which the probability of selecting 
the two units does depend on the order of selection. Examples 


of asymmetric probability functions are non-negative functions 
of the type 


ae 
X(X — X,)’ X 
Asymmetric functions suffer from the drawback that the result- 
ant estimates, in general, are very inefficient compared with those 
for the symmetric class. 


Minimum variance procedures have been derived for both probability 
schemes. The optimum estimate in the asymmetric case involves 
n isance parameters (i.e., population values of other units of the stra- 
tum). Either these parameters or certain functional relationships be- 
tween them must be known a priori. 

This theory is being applied to farm census data of North Carolina. 


168 WALTER D. FOSTER (West Virginia University). A Use of 
the Coefficient of Variation in Nutrition Surveys. 


In estimating the mean dietary intake of a group of nutritional sur- 
veys one of the oldest and most troublesome problems has been the 
number of days needed to study the group with its companion question, 
how many subjects. 

Since it has been almost impossible to avoid consecutive days in these 
surveys, the use of components of variance in this phase of the problem 
ran headlong into the requirement of independence. Evidence that 
consecutive days in diet estimates may be considered as independent is 
available. In answering the ever-faithful query of how many subjects, 
a use of the coefficient of variation made it possible to present a graphic 


solution applicable to persons in a wide range of age, location and living 
conditions in the Northeast. 
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169 PAUL MEIER (Princeton University). The Estimation of 
Error in Simple Lattice Designs. 


The use of inter-block information in the estimation 01 means in 
lattice designs requires the use of estimated weights. In general it is 
assumed that the resulting error estimate is nearly unbiased. Allowance 
for the inaccuracy of weighting has been made by adjusting the number 
of degrees of freedom. 

In this paper it is shown that the usual error estimate has a consider- 
able negative bias in small experiments. Approximate correction terms 
are given for the general case, with special attention to experiments with 
two and four replications. A numerical calculation for the 5 X 5 lattice 
with two replications shows that the maximum bias is approximately 
10% for the usual estimate and less than 2% for the adjusted estimate. 


170 R. NAIR (Institute of Statistics, The Consolidated University 
of North Carolina). A Note on Rectangular Lattices. 


Consider the n-ple rectangular lattice for p(p — 1) treatments in 
blocks of (p — 1) plots and with every treatment replicated n times. 
In the familiar notation used for incomplete block designs: 


v=p(p—1), k=(p—1), r=n, b=np (2<n<p). 


When n = 2, 3 and p respectively, Harshbarger has called these designs 
the simple, triple and near balance rectangular lattices. 

It was shown by the author (Biometrics, Vol. 7, No. 2, June 1951, 
pp. 145-154) that the simple rectangular lattice is a partially balanced 
incomplete block (p.b.i.b.) design having four associate classes and. that 
the triple rectangular lattice is not a p.b.i.b. design except when p = 3 
and 4. 

In the present paper it is shown that, when n = (p — 1) and p 
respectively, the n-ple rectangular lattice is a p.b.i.b. design having 
three and two associate classes. It is also shown that the dual of the 
n-ple rectangular lattice is a p.b.i.b. design having two and three 
associate classes respectively when n = p and n < p. 


171 PAUL IRICK (Purdue University). Sampling Distributions for 
Dispersion Statistics. 


Although analytical methods are ordinarily more powerful for de- 
termining specific results, it is felt by the author that a geometrical 
approach to this problem can be interesting, fruitful, and complete in 
its generality. Let the non-negative intervariate ranges in an ordered 
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sample be 7, , --- , 7-1 - In the n — 1 r space there is a unique point 
density, 5(r), associated with each sampled population, f(z). Then a 
dispersion statistic, g(r), determines contours in the r space such that 
integration of 5(r) over a g contour produces the frequency function for g. 
Cases are considered where g is linear or g’ quadratic, and where f(z) 
has finite or unlimited range. Using the machinery of the method, a 
number of sampling distributions can be written down immediately. 
Some new results have been obtained in special cases. The method also 
invites some plausible approximations for some of the more difficult 
combinations of f(z) and g(r). 


CARL F. KOSSACK and LESTER L. HELMS (Purdue Uni- 


172 versity). On the Approximation of Sampling Distributions by 
Punch Card Method. 


This paper présents a procedure for obtaining empirical distributions, 
by punch card methods, of statistics for which the exact distribution or 
a usable approximation has not been found. The mechanization of 
random sampling of a univariate population has been described and 
extended to random sampling of a correlated multivariate population 
whose covariance matrix is given. This procedure has been applied to 
Wald’s classification statistic in the univariate case, and the results noted. 


R. C. BOSE and K. R. NAIR (Institute of Statistics, The Con- 
173 solidated University of North Carolina). Resolvable Incomplete 
Block Designs with Two Replications. 


Incomplete block designs in which the blocks can be grouped in such 
a way that each group contains a complete replication, may be called 
resolvable designs. They are useful from the point of view of recovery 


-of inter-block information. It is therefore important to investigate 


resolvable designs involving a few replications. In this paper we con- 
sider a class of resolvable designs with two replications, which contains 
as a special case the well known square and rectangular lattices with two 
replications. Given a symmetrical balanced incomplete block design 
with u treatments, and r replications in which each pair occurs J times, 
we can use the incidence matrix (n,;) of this design to form a design of 
one class in the following way. Take a u X u square scheme, and in the 
cell (7,7) put z new treatments when n;; = 1, and y new treatments when 
n;; = 0. The total number of treatments obtained in this way is 
v = u[rz + (u — r)y]. The design is now constructed by taking the rows 
of the scheme for the blocks of the first replication, and the columns of 
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the scheme for the blocks of the second replication. It has been shown 
that both the intra and inter-block analysis can be carried out in a 
simple manner. The necessary formulae have been given, and the 
computational procedure illustrated by working out a numerical example. 


R. A. BRADLEY and M. E. TERRY (Virginia Polytechnic 
174 Institute). Rank Analysis of Incomplete Block Designs I— 
The Method Paired Comparisons. 


t 
True preferences or ratings , , tu, >, Tiv = 1, are assumed to 
exist for ¢ treatments in the uth of g groups of experimental data in an 
experiment involving paired comparisons. For the uth group, the prob- 
ability that treatment 7 is “better” than treatment j when they appear 
in a pair is postulated to be 7,,/(mi, + iu): 
Three tests of hypotheses are available and estimates of the treat- 
ment ratings may be obtained. The tests used likelihood ratio statistics 
to test 


Ho: = 1/t against 
(a) 

H,: «., = 7; for all u, 

Ho: tu = 1/t against 
(b) 

H,: Vi and 

Ho: ww = 7, for all u against 
(c) 


Tin 


Small sample distributions with tables are available for tests (a) and 
(b). In all three tests limiting distributions are shown to be in the form 
of chi square. 


175 D. B. DUNCAN and R. C. RHODES (Virginia Polytechnic 
Institute). Multiple Regression with a Quantal Response. 


The problem considered is that of fitting a maximum likelihood 
multiple regression equation to data in which the response is quantal, the 
probit transformation is appropriate and the number, r, of independent 
regression variates is not small. 

Iterative methods, for example the Bliss-Fisher method, are available 
but these have been developed mainly for the case r = 1 and rapidly 
become impractical for cases r > 2. 
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A method is developed based on (i) the approximation of the weighted 
deviations of the working probits from the provisional probits by linear 
functions of the provisional probits and (ii) the replacement of the 
independent z variates throughout most of the procedure by a linear 
function of them, termed a composite regression variate. These devices 
lead to a simple procedure and result in an estimated 70 to 90% saving 
in work. 


R. A. BRADLEY and M. E. TERRY (Virginia Polytechnic 
176 Institute). Rank Analysis of Incomplete Block Designs II— 
The Method for Blocks of Three (Preliminary Report). 


The extensions of the Rank Analysis of Incomplete Block Designs I— 
The Method of Paired Comparisons to blocks of size three are presented. 


As before, true preferences or ratings , , = 1 are 
‘=1 
assumed to exist for ¢ treatments in the uth of g groups. For the uth 
group the probability that treatment 7 obtains top ranking in the pres- 
ence of treatments j and k is x;,/(x.. + *;. + mx) and the probability 
that treatment j obtains rank 2, given that 7 had rank 1, is 7;./(4;. + miu). 
The three tests of hypotheses listed in the first paper are again 
developed. Tables are under preparation but are not yet available or 
complete. 
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INTERNATIONAL BIOMETRIC SYMPOSIUM 


The third international session of The Biometric Society was a ¥ 
symposium on ‘Biometric Problems in the Prediction and Estimation 
of the Growth of Plants in Tropical and Sub-Tropical Regions.” It 
was held in the new building of the Indian Statistical Institute in Cal- 
cutta on the 17th and 18th of December 1951. 

The first meeting on the 17th of December convened at 5:30 p.m. 
under the Chairmanship of Professor A. Linder with about 150 persons 
in attendance. The first paper by Dr. J. O. Irwin was a “Contribution 
to a Discussion on Crop Predictions,” which gave a critical and historical 
review of the work in this field. Dr. F. Yates, who spoke next on “Crop 
Prediction in England,” described the methodology of estimating ana 
predicting crops with examples drawn from the British experimentation. 
The third paper was written jointly by Messrs. J. M. Sengupta, I. MI. 
Chakravarti and D. Sarkar on “Sampling Experiments for the Estime- 
tion of Cinchona Yield in Madras: 1950.” In the absence of Mr. 
Sengupta, the practical aspect of the problem was presented by Myr. 
8. C. Sen, and the technical aspects by Mr. I. M. Chakravarti. Messrs. 
kK. S. Banerjee, F. Yates, J. J. Chinoy, R. A. Fisher, J. O. Irwin and 
J. B.S. Haldane participated in the discussion which followed. 

The second meeting convened on the i8th of December at 9:30 a.m. 
under the Chairmanship of Dr. F. Yates and was attended by about 100 
persons. The opening discussion continued that started the previous 
evening, with Messrs. C. R. Rao, F. Yates, J. J. Chinoy, N. K. Rao and 
K. Kishen participating. The first rapporteur was Professor M. E. 
Belz, who spoke on ‘Recent Experiments Relating to Crop and Pastures 
in Australia.”” He was followed by Professor P. C. Mahalanobis whose 
paper concerned some problems arising in crop-cutting experiments, 
specially the size of cut and border effects. Messrs. R. A. Fisher, G. 
Rasch, P. V. Sukhatme, F. Yates, C. R. Rao and P. C. Mahalanobis 
took part in the discussion which followed. 


British Region. At the annual meeting of the Region in London on 
November 29, 1951, the following officers were elected for 1952: Vice- 
President—Dr. F. Yates; Treasurer—Dr. A. R. G. Owen; Secretary— 
Dr. D. J. Finney; Committee (1952-54)—Prof. R. A. Fisher and Dr. 
R. R. Race. Following the business session papers were presented by 
D. H. Chitty, K. D. Tocher and F. Yates. Abstracts of these papers 
appear on page 89 of this issue. 
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Région Frangaise. A la séance de la Région, le 12 décembre, 1951, a 
Paris, le Docteur Maurice Cara a parlé sur le sujet suivant: Quelques 
relations entre les valeurs fonctionnelles respiratoires et divers facteurs 
physiologiques et pathologiques. 


ENAR Meetings. Through ENAR, The Biometric Society participated 
in two programs at the annual meeting of the American Public Health 
Association, held in San Francisco, October 29 to November 2, 1951. 
The first of these, given jointly with the Statistics Section of the APHA, 
Jacob Yerushalmy presiding, concerned Methodology in Follow-up 
Studies with the following papers: Practical implications of certain 
stochastic models on different methods of follow-up studies, by Evelyn 
Fix. The effect of incomplete information on follow-up studies, by A. P. 
Iskrant and Q. R. Remein. Survival rates of patients treated for pri- 
mary carcinoma of the bladder, by M. L. Clark, J. H. Gerende, and 
M. B. Peeples. Estimates of effectiveness of cancer therapy from records 
of mortality following treatment, by J. Berkson and R. P. Gage. 

The second session, given jointly with the Epidemiology and Sta- 
tistics Section with C. E. Smith in the chair, considered Methodology in 
Chronic Disease Morbidity Studies with the following program: Influ- 
ence of the dynamic character of a chronic disease on the interpretation 
of morbidity rates, by P. E. Sartwell and Margaret Merrell. The fre- 
quency and geographic distribution of multiple sclerosis as indicated by 
mortality statistics and morbidity surveys in the United States and 
Canada, by L. T. Kurland. California morbidity research project, by 
Arthur Weissman. Preventive aspects of chronic disease, by M. L. 
Levin. A. A .Ciocco opened the discussion. 

In December 1951, the Region met jointly with the American Associa- 
for the Advancement of Science, Sections A (Mathematics) and H 
(Anthropology), and with the American Society of Naturalists in 
Philadelphia, Pa. Three sessions were arranged under the direction of 
chairman J. N. Spuhler and committee. The morning session on 
December 27 was under the chairmanship of H. Levene and featured 
talks on mathematical biology by N. Rashevsky, A. Shimbel, and 
G. Karreman. A symposium that afternoon on “The Use of Statistical 
Models to Interpret Data on Human Population Genetics,’”’ with M. W. 
Smith presiding, was devoted to papers by C. C. Li, J. V. Neel, B. Glass, 
and J. N. Spuhler and D. J. Hager, H. Levene opening the discussion. 
The third session, on December 28, was chaired by M. Whittinghill and 
featured a paper on radiation injuries following the atomic bomb in 
Hiroshima by M. L. Clark and F. X. Lynch, and on the detection of 
gonial crossing over by M. Skibinsky, with discussion by J. A. Rafferty 

and R. E. Comstock. 
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The fourth annual meeting of the Region was held in Boston on 
December 27-29, 1951, jointly with the Biometrics Section. of the 
American Statistical Association and with the Institute of Mathematical 
Statistics. At the Regional business meeting on December 29, the fol- 
lowing officers were named for 1952: Vice-President—H. W. Norton; 
Secretary-Treasurer—W . T. Federer; members of the Regional Committee 
—D. B. Duncan and H. C. Batson. The scientific program consisted of 
ten sessions. On Thursday, December 27, sessions were held on “Con- 
tributed Papers’ under the chairmanship of W. J. Youden with papers 
by W. T. Federer and R. K. McMillan, D. S. Robson and A. J. King, 
K. R. Nair, and B. Harshbarger; on “Statistical Problems in Animal 
Experimentation” under the chairmanship of E. L. Green with papers 
by A. E. Brandt, C. P. Stroud, and J. Cornfield; and on “The Evaluation 
of Diagnostic Procedures” under the chairmanship of J. W. Fertig with 
papers by J. Yerushalmy, J. E. Dunn, and H. T. C. Wilkerson and dis- 
cussion by P. Meier, E. C. Hammond and W. F. Taylor. The sessions 
on Friday, December 28, were entitled ‘‘Statistical Evaluation of Clinical 
Data” with Jane Worcester presiding and papers by J. Berkson, A. P. 
Yskrant, and J. Cornfield and with discussion by Sidney Cutler; on 
“ital Statistics: International Statistical Needs in the Study of Man” 
with P. M. Densen presiding and papers by K. Stowman, H. L. Dunn, 
and F. Linder and discussion by Harry Alpert and P. M. Densen; and 
on “Contributed Papers” conducted by H. W. Norton with papers by 
A. 8. Littell, S. P. Carroll and W. G. Cochran, and D. F. Votaw, Sr. 
The Saturday, December 29, sessions were on ‘Discrete Random Pro- 
cesses and Actuarial Theory” under the chairmanship of Mortimer 
Spiegelman with a paper by H. L. Seal and discussion by T. N. E. Gre- 
ville, J. E. Walsh, and H. W. Alexander; on “Morbidity Statistics” 
under the chairmanship of R. B. Reed with papers by B. D. Karpinos, 
B. S. Sanders, and M. Fraenkel and discussion by W. H. Haenszel, 
C. A. Bachrach, and M. Robins; on “A Review of Mathematical 
Biology” under the chairmanship of J. W. Tukey with papers by 
N. Rashevsky, A. Rapaport, G. Karreman, and A. Shimbel; and on 
“The Use of the Range” under the chairmanship of L. F. K. Randolph 
with papers by L. E. Moses and G. J. Resnikoff, F. Mosteller and J. 
Berkson, W. J. Dixon, and J. Moshman and discussion by J. F. Daly 
and W. R. Pabst, Jr. 
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SPECIAL SESSION ON SURVEY SAMPLING, IOWA STATE 
COLLEGE, P. V. Sukhatme to lecture 


During the first six weeks of the spring quarter, beginning March 27, 
‘1952, Dr. P. V. Sukhatme, now with the Food and Agriculture Organiza- 
tion of the United Nations in Rome, Italy, will be Visiting Professor of 
Statistics at Iowa State College. While here he will give a series of 
daily lectures in intermediate and advanced survey sampling. These 
lectures were largely developed in the course of the 1950 and 1951 sum- 
mer sessions on sample surveys held at New Delhi on behalf of the 
Indian Society of Agricultural Statistics, although some more advanced 
material dealing with subsampling, systematic sampling and nonsamp- 
ling errors wil! also be covered. 

The first six weeks of the spring quarter are being designated as a 
special session in survey sampling. Glenn L. Burrows, Mathematical 
Statistician in the Bureau of Agricultural Economies, USDA, wil! 
participate in the Short Course in Sampling by giving two seminar talks, 
“Sampling in two related populations” and “Consumer acceptance and 
preference in retail store experiments,” on April 16 and 17. Morris 
Hansen, Assistant Director for Statistical Standards, Bureau of the 
Census, will give a seminar talk, “On measurement. of response errors,” 
on April 30; one entitled “Sampling with variable probabilities” will be 
presented by William Hurwitz, Chief, Statistical Research Section of 
the Bureau of the Census, on May 1. Also on May 1, at a joint Statistics- 
Social Science Seminar, Hansen and Hurwitz will discuss “Application 
of statistical methods in the 1950 Censuses.” 


BIOSTATISTICS CONFERENCE, IOWA STATE COLLEGE, June 16- 
July 23, 1952 


A biostatistics conference has been scheduled for the first session of 
the 1952 Summer Quarter at Ames, sponsored by faculty members 
working in agriculture, biology and statistics at Towa State College and 
by The Biometric Society (ENAR). The subject matter of the five-week 
conference is arranged so that many who cannot attend the entire con- 
ference can advantageously come for one or more of the weeks. Iowa 
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State College is giving the Conference financial support. Publication in 
book form is intended. 

The plan of the program is that each morning a biologist will present 
a problem, outline the objectives, describe techniques suitable for the 
experiment and analysis. A paired statistician will discuss suitable 
experimental designs and statistical and mathematical methods for 
attacking the problem. These speakers will preside at a general discus- 
sion period of the same topic the same afternoon. Similar discussion 
periods proved quite helpful in clarifying the contents of papers pre- 
sented at the morning sessions in the Heterosis Conference at Ames 
the summer of 1950. Evening discussion groups or workshop meetings 
are also possible. 

The program is tentatively arranged in five somewhat te 
weekly units as follows: 


First week: Development of Quantitative Biology 

Second week: Specification of Populations and Their Processes 

Third week: The Estimation of Populations 

Fourth week: The Estimation of Biological Effects 

Fifth week: Biomathematical Mechanisms Within the Individual and 
Species 


Invited speakers include: Edgar Anderson, Geoffrey Beall, Joseph 
Berkson, Chester I. Bliss, A. E. Brandt, Samuel Brody, R. E. Buchanan, 
C. West Churchman, C. C. Cockerham, Jerome Cornfield, C. W. Cotter- 
man, James F. Crow, D. B. DeLury, Peter Dews, Harold F. Dorn, 
Arthur Dutton, Walter T. Federer, R. A. Fisher, R. P. Gage, John W. 
Gowen, A. A. Hasel, P.G, Homeyer, John W. Hopkins, Harold Hotelling, 
S. L. Isaacson, R. J. Jessen, O. Kempthorne, Herbert H. Kramer, 
Warren H. Leonard, Cecilie Leuchtenberger, P. Levene, J. L. Lush, 
Lloyd C. Miller, K. R. Nair, J. V. Neel, Jerzy Neyman, Thomas Park, 
Ernest Pollard, N. Rashevsky, F. J. Ryan, Leslie W. Scattergood, 
William J. Schull, John P. Scott, G. W. Snedecor, George F. Stewart, 
Clyde Stormont, P. C. Tang, D. J. Thompson, John W. Tukey, F. M. 
Wadley, Sewall Wright, Frank Yates and John A. Zoellner. 

It is expected that the Conference will be of interest to advanced 
undergraduates, graduate students, and to research workers in the 
various biological sciences and statisticians who are interested in 
statistics as a research tool. Some graduate credit in Statistics will be 
allowed for attendance and study during the Conference. 

Rooms will be available in the college dormitories at the usual rates. 
For more detailed information write: T. A. Bancroft, Director, Statis- 
tical Laboratory, Iowa State College, Ames, Iowa. 
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INSTITUTE OF STATISTICS, SUMMER STATISTICS CONFER- 
ENCE, June 16-July 25, 1952 


The Institute of Statistics of The Consolidated University of North 
Carolina with financial assistance from foundation funds is conducting 
an integrated series of Summer Conferences, June 16-July 25, 1952. 
These conferences are for consulting statisticians and research workers 
who are using statistics. The advisory services of R. A. Fisher, Uni- 
versity of Cambridge, England and Frank Yates, Rothamsted Experi- 
mental Station, Harpenden, Herts, England have been secured for the 
entire period. W. G. Cochran, Head, Department of Biostatistics, 
Johns Hopkins University, Baltimore, Maryland will serve in the same 
capacity July 7-25. 

The program has been planned by fourteen statisticians located 
over the United States. A wide scope of topics are to be discussed. 
The general plan is to have one or two papers presented each morn- 
ing. Mimeographed copies of most papers will be provided for the 
conferees. The afternoons will be free for recreation or small group 
discussions. Each evening the conference group will meet for further 
discussions. 

These conferences are to be held at Blue Ridge Assembly which is 
fifteen miles East of Asheville and three miles from Black Mountain, 
North Carolina. Attendance at these conferences is being limited to 
two weeks, since it is desirable to keep the discussion groups small. 
Priority for attendance is being given to research workers in the South- 
east. 


SUMMER SESSIONS IN BERKELEY, CALIFORNIA 


This year’s summer program at the Statistical Laboratory of the 
University of California, Berkeley, California, consists of two sessions, 
June 23—August 2 and August 4-September 13. The program includes 
2 of the usual undergraduate courses, one in each session, as well as one 
new course in each session. In the first session the new course being 
offered is called: “Statistical methods of searching for casual relation- 
ships.’’ The course is designed to acquaint the students with statistical 


methods of approaching practical problems with particular reference to 
correlation and causality and to the pitfalls which studies of this kind 
frequently involve. > 

The faculty of the summer sessions will include Dr. Grace E. Bates, 
Assistant Professor, Department of Mathematics, Mt. Holyoke College, 
South Hadley, Mass., Professor J. Neyman, Dr. E. Fix, Dr. G. Kallian- 
pur, Mr. L. LeCam, of the Statistical Labératory, University of Cali- 
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fornia. Professor J. Neyman will be available for consultations on work 
leading to higher degrees. 


VIRGINIA POLYTECHNIC INSTITUTE STATISTICAL SUMMER 
SESSION, July 29-August 15, 1952 


The Department of Statistics and the Statistical Laboratory in 
cooperation with the Department of Mathematics and the Department 
of Industrial Engineering of the Virginia Polytechnic Institute will con- 
duct a special statistical summer session, July 29 to August 15, 1952. 
The program will be for graduate students, research workers, and 
technicians in government and industry. Special offerings will be given 
in the statistics of taste testing, bio-assay, sampling and in engineering 
research and production. For further details write the Department of 
Statistics, Virginia Polytechnic Institute, Blacksburg, Virginia. 


STORRS SUMMER SESSION 


The third meeting of the Summer Seminar in Statistics will take place 
on the campus of the University of Connecticut during the three weeks 
of August 4-22, 1952. As.in previous years, informality and discussion 
will be stressed, both in organized and accidental groups. There will be 
one or two seminar sessions each day and clinics on the treatment of 
submitted problems in the applications. 

The first week, August 4-8, will be devoted to the modifieations of 
statistical techniques appropriate for chemistry, and is being organized 
by Cuthbert Daniel, 116 Pinehurst Ave., New York, New York and 
W. L. Gore, duPont, Wilmington, Delaware. , 

The second week, August 11-15, will be divided into two parts. The 
latter part will be devoted to applications of minimax techniques and is 
being organized by J. L. Hodges, Committee on Statistics, University of 
Chicago, Chicago 38, Illinois. 

The third week, August 18-22, will be divided into two parts. The 
first part will be devoted to follow-up studies as they arise in medicine, 
and is being organized by Irwin Bross, Department of Biostatistics, 
School of Hygiene and Public Health, The Johns Hopkins University, 
615 North Wolfe Street, Baltimore 5, Maryland. The second part will be 
devoted to applications in actuarial work, and is being organized by 
Mortimer Spiegelman, Metropolitan Life Insurance Co., One Madison 
Avenue, New York 10, New York. 

Professor R. A. Fisher will be a member of the seminar during the first 
two weeks. 


Those interested in the subjects under discussion are invited to attend 
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by the day, week or other period. (A nominal registration fee will be 
collected). For further information or reservations for campus housing, 
write to the Secretary of the Seminar, Professor D. F. Votaw, 210 Leet 
Oliver Memorial Hall, Yale University, New Haven, Connecticut. 
Suggestions of problems which might be presented before the clinic may 
also be sent to Professor Votaw. 
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